Title: Development of an Air Quality Forecasting System Based on Random Forest and XGBoost Algorithms
Authors: Thanh Nguyen Cong Truong Tran Quang
Volume: 9
Issue: 10
Pages: 128-136
Publication Date: 2025/10/28
Abstract:
Air pollution is becoming a serious problem in large cities, in which PM2.5 fine dust is a factor that directly affects human health. This study introduces an integrated system that allows assessment, forecasting and warning of air quality based on real-time data collected from environmental sensors. The goal of the study is to support the monitoring and management of urban environments in a smarter and more effective way. Machine learning algorithms are used by the System to analyze and forecast PM2.5 concentrations. Specifically, Random Forest and XGBoost models are applied to the forecasting problem, while unsupervised learning methods such as PCA (Principal Component Analysis) and K-Means clustering are used to explore data structure and group pollution samples with similar characteristics. Experimental results on real data show that the Random Forest model achieves high forecasting performance with RMSE = 4.612, MAE = 2.417, and Rē = 0.916, outperforming baseline models in both accuracy and stability. In terms of architecture, the system is built on Flask (backend) to process data and run machine learning models, along with React (frontend) to display forecast results and visualize multidimensional data. This design makes the system flexible, scalable, and deployable in many different environments. Overall, the study shows that the combination of machine learning and modern web technology is a potential approach for real-time air quality monitoring and management systems in the future.