Title: Applying XGBoost Machine Learning Model to Forecast Revenue for Small Businesses
Authors: Thu To Hong Minh Khang Nguyen Phuc
Volume: 9
Issue: 10
Pages: 291-300
Publication Date: 2025/10/28
Abstract:
The forecast of short-term revenues for small and medium-sized enterprises (SMEs) plays a critical role in rapidly changing and competitive retail settings. Appropriate short-term revenue forecasts are essential for decisions in inventory, cash flow, and for staffing and timing promotions. Unfortunately, SMEs frequently do not possess the analytical instruments necessary for performing such forecasts. This research, therefore, seeks to formulate and assess a practical revenue forecasting system based on available data, tailored to small businesses. We have designed a comprehensive machine learning system in the forecasting process that automates the collection and cleanup of historical sales data, noise and outlier removal, time series feature engineering that captures trends, seasonality, holidays, weekends, and special events, and the training of a supervised model to forecast revenue. The forecasting engine relies on an eXtreme Gradient Boosting (XGBoost) regressor model, and for its competency in handling structured business data and for its effective non-linear relationship modeling capabilities. The model has been trained and evaluated with the temporal split of the public dataset of Walmart Store Sales, where the last weeks of sales data were set aside for simulating real-time forecasting. The system demonstrated a remarkable level of accuracy with the predictions, boasting an Rē of 0.9843 and an MAE of 1267.09 on the training set, thus capturing local demand and seasonal fluctuations. The variance in forecast errors may indicate operational volatility on a store-department level, which may require managerial intervention. Supervised machine learning techniques respond to planning needs by enabling accurate, weekly revenue forecasts for horizons of 30, 60, or 90 days. This approach allows SMEs to synchronize purchases and cash management with anticipated inflows, assess the influence of scheduled promotions, and pinpoint vulnerable locations, while confirming the potential of future improvements. Such improvements may include the use of external variables (news and local market events) or the application of hybrid or deep learning models for demands with high volatility.