Title: Soft Voting Ensemble Approach of Logistic Regression and Random Forest for Stroke Risk Prediction
Authors: Arinda Mahadesyawardani, Nur Chamidah, Marisa Rifada, Toha Saifudin
Volume: 9
Issue: 12
Pages: 97-102
Publication Date: 2025/12/28
Abstract:
Stroke is a major global health burden, making early risk prediction crucial for prevention and clinical decision-making. This study evaluates a Soft Voting Ensemble (SVE) that integrates Logistic Regression and Random Forest to enhance binary stroke classification. Using optimal parameters obtained through hyperparameter tuning with 5-fold cross-validation on the training set, the SVE consistently outperformed the individual models in both in-sample and out-of-sample evaluations. The ensemble achieved an in-sample F1-score of 0.80 and an AUC of 0.91, and an out-of-sample F1-score of 0.80 and an AUC of 0.89. Feature importance analysis identified age and lifestyle-related attributes as key contributors, aligning with established stroke risk factors. These findings highlight the capability of ensemble learning to support clinical assessment and risk stratification, offering a promising direction for developing more reliable stroke prediction systems.