Title: AI-Enhanced Sorting of Genomic Sequences for Accelerated Bioinformatics Analysis
Authors: Nesreen S. Aljerjawi and Samy S. Abu-Naser
Volume: 9
Issue: 8
Pages: 106-109
Publication Date: 2025/08/28
Abstract:
The rapid advancements in next-generation sequencing technologies have led to an exponential increase in the volume of genomic data. Processing and analyzing this data, particularly through computationally intensive tasks like sequence alignment and assembly, often pose a significant bottleneck. Traditional sorting algorithms, which are often used as a preliminary step to group similar sequences, are typically based on simple lexicographical or hash-based methods that do not capture the underlying biological or structural relationships between sequences. This paper proposes a novel approach: an AI-enhanced sorting framework that utilizes a deep learning model to pre-sort genomic sequences based on learned biological features. The model is a Convolutional Neural Network (CNN) trained on a large dataset of reference genomes and their corresponding, biologically-meaningful sort order. The network learns to predict a "biological similarity score" for each sequence, which can then be used to sort the data more intelligently than traditional methods. We demonstrate that this AI-enhanced sorting can significantly reduce the computational time required for subsequent alignment steps by creating more manageable, pre-clustered data blocks. Our results indicate that this method accelerates the overall bioinformatics analysis pipeline, offering a scalable solution for handling petabyte-scale genomic datasets.