Yandell, M. & Ence, D. A newbie’s information to eukaryotic genome annotation. Nat. Rev. Genet. 13, 329–342 (2012).
Google Scholar
Lewin, H. A. et al. The Earth BioGenome Venture 2020: beginning the clock. Proc. Natl Acad. Sci. USA 119, e2115635118 (2022).
Google Scholar
Jumper, J. et al. Extremely correct protein construction prediction with AlphaFold. Nature 596, 583–589 (2021).
Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of different transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Google Scholar
Korf, I. Gene discovering in novel genomes. BMC Bioinf. 5, 59 (2004).
Google Scholar
Lukashin, A. V. & Borodovsky, M. GeneMark. HMM: new options for gene discovering. Nucleic Acids Res. 26, 1107–1115 (1998).
Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open supply ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Google Scholar
Burge, C. & Karlin, S. Prediction of full gene buildings in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Google Scholar
Thibaud-Nissen, F., Souvorov, A., Murphy, T., DiCuccio, M. & Kitts, P. Eukaryotic genome annotation pipeline. The NCBI Handbook Vol. 2 (Nationwide Heart for Biotechnology Data, 2013).
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database administration software for second-generation genome initiatives. BMC Bioinf. 12, 1–14 (2011).
Google Scholar
Gabriel, L. et al. BRAKER3: totally automated genome annotation utilizing RNA-seq and protein proof with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Res. 34, 769–777 (2024).
Google Scholar
Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP considerably improves the accuracy of automated annotation of huge eukaryotic genomes. Genome Res. 34, 757–768 (2024).
Google Scholar
Brůna, T. et al. Galba: genome annotation with miniprot and AUGUSTUS. BMC Bioinf. 24, 327 (2023).
Google Scholar
Aken, B. L. et al. The Ensembl gene annotation system. Database 2016, baw093 (2016).
Google Scholar
Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141 (2006).
Google Scholar
Chen, N. Utilizing Repeat Masker to establish repetitive parts in genomic sequences. Curr. Protoc. Bioinformatics 5, 4.10. 11–14.10. 14 (2004).
Google Scholar
Holst, F. et al. Helixer: ab initio prediction of major eukaryotic gene fashions combining deep studying and a hidden Markov mannequin. Nat. Strategies (2025).
Stiehler, F. et al. Helixer: cross-species gene annotation of huge eukaryotic genomes utilizing deep studying. Bioinformatics 36, 5291–5298 (2021).
Google Scholar
Gabriel, L., Becker, F., Hoff, Ok. J. & Stanke, M. Tiberius: end-to-end deep studying with an HMM for gene prediction. Bioinformatics 40, btae685 (2024).
Google Scholar
Hochreiter, S. & Schmidhuber, J. Lengthy short-term reminiscence. Neural Comput. 9, 1735–1780 (1997).
Google Scholar
Sætre, G. P. & Saether, S. A. Ecology and genetics of speciation in Ficedula flycatchers. Molecular Ecology 19, 1091–1106 (2010).
Google Scholar
Parkin, I. A. et al. Transcriptome and methylome profiling reveals relics of genome dominance within the mesopolyploid Brassica oleracea. Genome Biol. 15, R77 (2014).
Google Scholar
Chen, L., DeVries, A. L. & Cheng, C.-H. C. Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish. Proc. Natl Acad. Sci. USA 94, 3811–3816 (1997).
Google Scholar
Vosseberg, J. et al. The rising view on the origin and early evolution of eukaryotic cells. Nature 633, 295–305 (2024).
Google Scholar
Zhou, Y. et al. Gene fusion as an essential mechanism to generate new genes within the genus Oryza. Genome Biol. 23, 130 (2022).
Google Scholar
Bang, M.-L. et al. The entire gene sequence of titin, expression of an uncommon ≈700-kDa titin isoform, and its interplay with obscurin establish a novel Z-line to I-band linking system. Circ. Res. 89, 1065–1072 (2001).
Google Scholar
Vaswani, A. et al. Consideration is all you want. In thirty first Convention on Neural Data Processing Programs (NIPS 2017) (2017).
Viterbi, A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Idea 13, 260–269 (1967).
Google Scholar
Jaganathan, Ok. et al. Predicting splicing from major sequence with deep studying. Cell 176, 535–548.e24 (2019).
Google Scholar
Zoph, B. et al. St-MoE: designing secure and transferable sparse skilled fashions. Preprint at (2022).
Dalla-Torre, H. et al. Nucleotide Transformer: constructing and evaluating strong basis fashions for human genomics. Nat. Strategies 22, 287–297 (2025).
Brixi, G. et al. Genome modeling and design throughout all domains of life with Evo 2. Preprint at bioRxiv (2025).
Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).
Google Scholar
Harrison, P. W. et al. Ensembl 2024. Nucleic Acids Res. 52, D891–D899 (2024).
Google Scholar
Wang, S. et al. De novo and somatic structural variant discovery with SVision-pro. Nat. Biotechnol. 43, 181–185 (2025).
Lin, J. et al. SVision: a deep studying method to resolve advanced structural variants. Nat. Strategies 19, 1230–1233 (2022).
Google Scholar
de Klerk, E. & t Hoen, P. A. C. Various mRNA transcription, processing, and translation: insights from RNA sequencing. Developments Genet. 31, 128–139 (2015).
Google Scholar
Xia, Z. et al. Dynamic analyses of different polyadenylation from RNA-seq reveal a 3′- UTR panorama throughout seven tumour sorts. Nat. Commun. 5, 5274 (2014).
Google Scholar
Zhang, R. X. et al. A high-resolution single-molecule sequencing-based Arabidopsis transcriptome utilizing novel strategies of Iso-seq evaluation. Genome Biol. 23, 149 (2022).
Google Scholar
Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: assessing genome meeting and annotation completeness. in Gene Prediction: Strategies Protocols (ed Kollmar, M.) 227–245 (Springer, 2019).
Lin, T. Y., Goyal, P., Girshick, R., He, Ok. M. & Dollár, P. Focal loss for dense object detection. In Proc. IEEE Worldwide Convention on Pc Imaginative and prescient (eds Ikeuchi, Ok. et al.) 2999–3007 (IEEE, 2017).
Li, X. Y. et al. Cube loss for data-imbalanced NLP duties. In Proc. 58th Annual Assembly of the Affiliation for Computational Linguistics (eds Jurafsky, D. et al.) 465–476 (Affiliation for Computational Linguistics, 2020).
He, Ok., Zhang, X., Ren, S. & Solar, J. Deep residual studying for picture recognition. In Proc. IEEE Convention on Pc Imaginative and prescient and Sample Recognition (eds Bajcsy, R. et al.) 770–778 (IEEE, 2016).
Fedus, W., Zoph, B. & Shazeer, N. Change transformers: scaling to trillion parameter fashions with easy and environment friendly sparsity. J. Mach. Be taught. Res. 23, 1–39 (2022).



