A Predictive Auto-Scaling Framework for Microservices in Distributed Systems: A Cost-Performance Optimization Approach for U.S. Enterprises

Peter Gbenle , Olumese Anthony Abieba, Wilfred Oseremen Owobu , James Paul Onoja, Andrew Ifesinachi Daraojimba, Adebusayo Hassanat Adepoju, Ubamadu Bright Chibunna

International Journal of Academic Management Science Research (IJAMSR)

Title: A Predictive Auto-Scaling Framework for Microservices in Distributed Systems: A Cost-Performance Optimization Approach for U.S. Enterprises

Authors: Peter Gbenle , Olumese Anthony Abieba, Wilfred Oseremen Owobu , James Paul Onoja, Andrew Ifesinachi Daraojimba, Adebusayo Hassanat Adepoju, Ubamadu Bright Chibunna

Volume: 9

Issue: 4

Pages: 364-388

Publication Date: 2025/04/28

Abstract:
The growing adoption of microservices architecture among U.S. enterprises has revolutionized application development and deployment by enabling greater scalability, flexibility, and resilience. However, managing resource allocation efficiently in distributed systems remains a critical challenge, especially in balancing cost and performance. This study proposes a Predictive Auto-Scaling Framework designed to dynamically manage the computational resources of microservices in distributed environments. By integrating machine learning algorithms with historical and real-time workload data, the framework anticipates demand fluctuations and proactively adjusts resource allocations. This predictive capability ensures optimal system performance while minimizing operational costs, a critical concern for enterprises operating under tight IT budgets and varying workload intensities. The framework employs time-series forecasting models, such as ARIMA and LSTM, to predict workload patterns and integrates with Kubernetes Horizontal Pod Autoscaler (HPA) for automated, rule-based scaling. It also introduces a cost-performance optimization layer using reinforcement learning to evaluate various scaling strategies and select the most efficient configuration in terms of CPU, memory, and throughput requirements. A multi-metric decision model is used to ensure service-level objectives (SLOs) are met without overprovisioning resources. Empirical evaluations were conducted using real-world workloads from financial and e-commerce applications, which are particularly sensitive to performance degradations and resource costs. The results demonstrate a significant reduction in cloud resource consumption-up to 28%-while maintaining or improving application response times and availability. This approach addresses key business goals such as cost-efficiency, service quality, and scalability, which are vital for digital competitiveness in the U.S. market. This research offers a scalable, adaptive solution for enterprise IT managers and cloud architects seeking to optimize microservice deployments in a cost-effective and performance-driven manner. The proposed framework can be extended to multi-cloud and hybrid-cloud environments, ensuring broader applicability in diverse enterprise scenarios. The study contributes to advancing intelligent resource orchestration techniques in distributed computing, aligning technological innovation with enterprise priorities.

Download Full Article (PDF)