International Journal of Academic Engineering Research (IJAER)

Title: Predicting Smoking-Associated Thyroid Dysfunction Using Explainable Machine Learning

Authors: Duha K. A. AlDaya, Salah Aldin Shadi Aldaya, and Samy S. Abu-Naser

Volume: 9

Issue: 12

Pages: 97-105

Publication Date: 2025/12/28

Abstract:
Background: Smoking alters endocrine function, yet its contribution to thyroid dysfunction has not been quantified using explainable artificial intelligence. Methods: Adult participants from the National Health and Nutrition Examination Survey (NHANES 2009-2012) were analyzed. Demographic variables, smoking exposure metrics, and thyroid hormone levels were used to train Logisti2c Regression, Support Vector Machine, Random Forest, Gradient Boosting, and Extreme Gradient Boosting (XGBoost) models. Stratified five-fold cross-validation was applied. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (ROC-AUC). Model interpretability was assessed using SHapley Additive exPlanations (SHAP). Results: Ensemble models achieved superior performance, with XGBoost yielding the highest ROC-AUC (0.873), followed by Gradient Boosting (0.861), Random Forest (0.845), Support Vector Machine (0.816), and Logistic Regression (0.793). Smoking exposure variables, particularly pack-years and cigarettes per day, were among the most influential predictors, alongside thyroid-stimulating hormone (TSH) and body mass index (BMI). Conclusion: Explainable machine learning enables clinically meaningful prediction of smoking-associated thyroid dysfunction and may support early endocrine risk identification.

Download Full Article (PDF)