Title: Processing and Normalization of Uzbek Texts
Authors: Sobirov Shohjahon Ganijon ogli,Sobirova Nazira Ganijon kizi,Sobirova Zarnigor Ganijon kizi
Volume: 10
Issue: 2
Pages: 47-55
Publication Date: 2026/02/28
Abstract:
- Nowadays, technologies for the automatic analysis of texts through computers are rapidly advancing. In particular, when working with texts written in the Uzbek language, an essential step is their initial preparation - that is, preprocessing and text normalization. This article provides a detailed overview of these processes. Uzbek is an agglutinative language, which extensively employs derivational and grammatical affixes. This significantly complicates morphological analysis and the identification of word forms. Preprocessing involves operations such as removing unnecessary characters, splitting text into words, converting to lowercase, and similar tasks. Normalization, on the other hand, includes correcting misspelled words, converting words to their base dictionary forms, and expanding abbreviation.Due to the complex structure of the Uzbek language, these processes pose considerable challenges. The article discusses methods and tools developed to overcome these difficulties and illustrates their effectiveness with practical examples. The findings from ongoing research contribute to improving the quality of Uzbek language processing in digital environments, including applications in machine translation, speech-to-text systems, and automated text analysis.