International Journal of Engineering and Information Systems (IJEAIS)

Title: Developing a Machine Learning-Based Forest Fire Prediction System Using Random Forest and XGBoost

Authors: Phong Nguyen Hoang Quan Nguyen Dong Truong Tran Quang

Volume: 9

Issue: 11

Pages: 41-51

Publication Date: 2025/11/28

Abstract:
Wildfire risk has intensified under hotter, drier, and more variable climate conditions, creating the need for operational systems that can assess danger levels in near real time rather than relying solely on delayed manual or satellite-based monitoring. This study develops, evaluates, and deploys a machine learning-based forest fire prediction system capable of (i) classifying fire-risk conditions ("Fire" vs "Safe") and (ii) estimating a continuous Fire Weather Index (FWI) value from routinely measured meteorological and fuel-condition variables. The dataset encompasses temperature, wind speed, relative humidity, rainfall, and key fire behavior indices like FFMC, DMC, ISI, and FWI. Missing-value handling, outlier treatment, and feature scaling and encoding are included in the pipeline, along with multicollinearity and feature dominance control through correlation matrices and feature importance scoring. Class imbalance is addressed with SMOTE and class weighting. Several candidate models are assessed under both stratified and standard k-fold cross-validation, with hyperparameter tuning subsequently conducted using RandomizedSearchCV. Ensemble models outperform baseline approaches: the tuned XGBoost classifier attains approximately 98% accuracy and an F1-score of roughly 0.97 for binary fire-risk prediction, and the Random Forest Regressor achieves Rē ? 0.98 with RMSE ? 0.10 for FWI estimation, with minimal overfitting across training, validation, and test splits. The final optimized models, together with the preprocessing pipeline, are serialized and deployed in a Flask web application that accepts environmental inputs, returns real-time fire-risk assessments, and logs predictions for monitoring. These results indicate that data-driven wildfire warning can be implemented as an operational service; however, model generalization is still constrained by the limited geographic and temporal scope of the source dataset, highlighting the need for broader multi-regional, multi-season training data and integration of additional spatial context (e.g., vegetation, terrain, remote sensing) in future work.

Download Full Article (PDF)