ObjectiveWe sought to leverage machine learning algorithms to identify the complex clinical and serological signature of anti-centromere antibody (ACA) positivity in Sjögren’s syndrome (SS) patients.MethodsThis multicenter study analyzed clinical data from a cohort of 616 patients diagnosed SS, comprising 81 ACA-positive and 535 ACA-negative cases. To ensure robust model development, we randomly partitioned the dataset into training and validation subsets in a 7:3 ratio.
We implemented and compared six machine learning models after identifying optimal predictors using the LASSO regression. We mainly evaluate the performance of the model through the AUC and a series of comprehensive indicators.
To ensure clinical interpretability, we also employed the SHAP analysis method to quantify the influence of each feature on the model’s outcome.ResultsAmong the evaluated models, GBDT exhibited superior predictive efficacy. The model achieved an AUC value of 0.812 in the training set and maintained a robust AUC of 0.811 (95% CI: 0.699–0.906) in the validation cohort.
At the same time, the model has the highest sensitivity (0.750 in the validation test).
Model performance was primarily judged by AUC and additional metrics, with interpretability aided by SHAP analysis.