Objectives To develop and evaluate an explainable machine learning framework enhanced with synthetic data generation to predict unplanned 30-day hospital readmissions among patients with chronic obstructive pulmonary disease (COPD), heart failure (HF) and type 2 diabetes mellitus (T2DM), and to identify key clinical and social predictors of readmission. Design A retrospective cohort study using electronic health record data incorporating both structured variables and information extracted from unstructured clinical notes.
Synthetic data were generated using advanced resampling and deep learning-based techniques to address outcome imbalance and improve model training. Setting Intensive care unit and general ward admissions at a single tertiary academic medical centre included in the MIMIC-IV (Medical Information Mart for Intensive Care IV) database.
Participants Adult patients (≥18 years) were admitted with a primary diagnosis of COPD (n=14 050), HF (n=7097) or T2DM (n=12 735) between 2008 and 2019, with complete 30-day follow-up and no in-hospital mortality during the index admission. Primary and secondary outcome measures The primary outcome was unplanned all-cause hospital readmission within 30-days of discharge.
Participants were adults (≥18 years) with a primary diagnosis of COPD, HF, or T2DM admitted between 2008 and 2019, who had complete 30-day follow-up and survived the index admission.
Model interpretability relied on established global and local explanation approaches.