AI-Based Sequence Similarity Analysis as Digital Genetic Evidence: A Pilot Study on Growth-Related Genes
Keywords:
digital genetic evidence, bioinformatics, artificial intelligence, sequence similarity, stunting susceptibilityAbstract
Introduction — Stunting remains a major public health challenge, particularly in low- and middle-income countries, where growth impairment is influenced by complex interactions between environmental and biological factors. While nutritional and socioeconomic determinants have been extensively studied, the potential role of genetic susceptibility related to growth regulation remains underexplored from a bio-digital and forensic informatics perspective. This study investigates whether sequence-level similarity patterns among growth-related genes can be represented as digital genetic evidence using artificial intelligence–based computational analysis.
Methods — This pilot exploratory study analyzed protein and coding DNA sequences of six candidate growth-related genes (IGF1, IGF1R, GH1, GHR, LEP, SLC39A8) obtained from curated RefSeq Homo sapiens databases. An alignment-free analytical framework was implemented using k-mer term frequency–inverse document frequency (TF-IDF) feature extraction combined with principal component analysis for dimensionality reduction. Pairwise similarity assessment and embedding-based visualization were employed to explore latent sequence relationships.
Results — The analysis revealed distinct similarity patterns among growth-related genes, with hormonally associated genes and receptor proteins forming coherent clusters, while nutrient transporter–related genes exhibited clear separation in the embedding space. These patterns were biologically plausible and consistent with known functional characteristics, despite the absence of explicit functional annotation during feature extraction.
Conclusion — The findings demonstrate that AI-based alignment-free sequence analysis can generate reproducible similarity representations that function as digital genetic evidence. As a pilot exploratory study, this work highlights the feasibility of sequence-level similarity profiling for investigating growth-related genetic susceptibility, while providing a methodological foundation for future large-scale and population-specific studies.

.png)

