Title: Intelligent Web Crawlers with Federated Learning for Search Engine Freshness
Authors: Mohammad Abu Kausar, Mohammad Nasar
Volume: 9
Issue: 9
Pages: 150-160
Publication Date: 2025/09/28
Abstract:
The exponential growth of online content poses significant challenges for search engines in maintaining fresh, relevant, and trustworthy indexes. Traditional crawling strategies and reinforcement learning (RL)-based models improve adaptability but remain centralized, leading to high latency, communication overhead, and privacy risks. This paper introduces a federated reinforcement learning-driven intelligent crawler that integrates distributed training, freshness-aware scheduling, and privacy-preserving aggregation. In this framework, crawler nodes train local models to predict content changes and prioritize high-value pages, while a secure aggregator combines updates without sharing raw data. Experimental results demonstrate that our approach achieves an 18% improvement in freshness and a 40% reduction in communication overhead compared to centralized RL-based crawlers. These findings highlight the potential of federated crawling as a scalable, adaptive, and privacy-preserving paradigm for next-generation search engines.