Title: Predicting Smoking-Associated Thyroid Dysfunction Using Explainable Machine Learning
Authors: Duha K. A. AlDaya, Salah Aldin Shadi Aldaya, and Samy S. Abu-Naser
Volume: 9
Issue: 12
Pages: 97-105
Publication Date: 2025/12/28
Abstract:
Background: Smoking alters endocrine function, yet its contribution to thyroid dysfunction has not been quantified using explainable artificial intelligence. Methods: Adult participants from the National Health and Nutrition Examination Survey (NHANES 2009-2012) were analyzed. Demographic variables, smoking exposure metrics, and thyroid hormone levels were used to train Logisti2c Regression, Support Vector Machine, Random Forest, Gradient Boosting, and Extreme Gradient Boosting (XGBoost) models. Stratified five-fold cross-validation was applied. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (ROC-AUC). Model interpretability was assessed using SHapley Additive exPlanations (SHAP). Results: Ensemble models achieved superior performance, with XGBoost yielding the highest ROC-AUC (0.873), followed by Gradient Boosting (0.861), Random Forest (0.845), Support Vector Machine (0.816), and Logistic Regression (0.793). Smoking exposure variables, particularly pack-years and cigarettes per day, were among the most influential predictors, alongside thyroid-stimulating hormone (TSH) and body mass index (BMI). Conclusion: Explainable machine learning enables clinically meaningful prediction of smoking-associated thyroid dysfunction and may support early endocrine risk identification.