BackgroundOlder adults are the high-risk group for COVID-19-related death. This study aimed to develop an accurate, efficient, clinically interpretable machine learning (ML) model for predicting mortality risk in this population, using only routine hematological indicators at admission to avoid extra medical costs and radiation exposure.Methods2393 COVID-19 patients were enrolled in this retrospective study.
Missing values were imputed via Random Forest. RandomOverSampler was utilized during model training to alleviate moderate class imbalance.
Feature selection was conducted following the maximum relevance-minimum redundancy principle. Five ML algorithms—Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), XGBoost (XGB), Light Gradient Boosting Machine (LGBM) were optimized via bayesian optimization (BO).
We performed 10 rounds of random stratified data splitting; models were fitted on the training set, with intermediate screening and hyperparameter optimization implemented on the validation set. The independent held-out test set was strictly reserved for final performance evaluation.
Model performances were assessed using the ROC curve, accuracy, precision, recall, F1-score and brier score. Calibration curve evaluated concordance between predicted probabilities and actual outcomes, and decision curve analysis (DCA) quantified net clinical benefit in clinical practice.
Shapley additive exPlanations (SHAP) values and partial dependence plots (PDPs) interpreted feature importance and their associations with mortality risk. A simplified model was further developed using the top 10 key features identified by SHAP analysis and a corresponding risk prediction system was constructed to facilitate clinical application by physicians.ResultsThe LGBM model achieved the best comprehensive performance: AUC of 0.973, recall of 0.924, accuracy of 0.918, F1-score of 0.918, NPV of 0.923, and Brier score of 0.064.
It outperformed other algorithms in computational efficiency and cross-dataset stability. Top 10 key features identified included basophil percentage (BA%), C-reactive protein (CRP), procalcitonin (PCT), D-dimer (D-dimer), AST/ALT ratio, cardiac troponin I (cTnI), standard bicarbonate (SB), age, aspartate transaminase (AST), and oxygen saturation (SaO2) for predicting mortality risk in older adults.
Non-linear associations and threshold effects were observed (e.g., risk surged when CRP > 100 mg/L or D-dimer > 5–10 μg/mL). The simplified model reduced training time by 58.31% without compromising performance, comparable accuracy and interpretability.ConclusionThis study developed a TPE-LGBM model based on routine hematological indicators to predict mortality risk in older adults with COVID-19.
The model demonstrated favorable accuracy, efficiency, and interpretability, suggesting the potential value of applying explainable machine learning to address unmet medical needs.
Frontiers in Immunology published a clinical update in Infectious Disease on 25 May 2026. The item focuses on Explainable machine learning-based mortality risk stratification for older adults with COVID-19: pinpointing core immunological biomarkers and revealing dose-threshold effects. Open the detail page to review the full original feed content.