Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J. et al.) 4171–4186 (Association for Computational Linguistics, 2019).
Achiam, J. et al. GPT-4 technical report. Preprint at (2023).
Avsec, Ž. et al. Advancing regulatory variant effect prediction with AlphaGenome. Nature 649, 1206–1218 (2026).
Brixi, G. et al. Genome modelling and design across all domains of life with Evo 2. Nature 652, 1349–1361 (2026).
Luo, X., Kang, X. & Schönhuth, A. Predicting the prevalence of complex genetic diseases from individual genotype profiles using capsule networks. Nat. Mach. Intell. 5, 114–125 (2023).
Benegas, G., Albors, C., Aw, A. J., Ye, C. & Song, Y. S. A DNA language model based on multispecies alignment predicts the effects of genome-wide variants. Nat. Biotechnol. 43, 1960–1965 (2025).
Xu, A. et al. SNPBag: a foundation model for multitask genome-scale SNP analysis. Preprint at Research Square (2025).
Li, H. et al. BMFM-DNA: a SNP-aware DNA foundation model to capture variant effects. Preprint at (2025).
Gao, Z., Liu, Q., Zeng, W., Jiang, R. & Wong, W. H. EpiGePT: a pretrained transformer-based language model for context-specific human epigenomics. Genome Biol. 25, 310 (2024).
Zeng, W., Guo, H., Liu, Q. & Wong, W. H. Improving polygenic prediction from whole-genome sequencing data by leveraging predicted epigenomic features. Proc. Natl Acad. Sci. USA 122, e2419202122 (2025).
Zhou, H., Shrikumar, A. & Kundaje, A. Towards a better understanding of reverse-complement equivariance for deep learning models in genomics. In Proc. 16th Machine Learning in Computational Biology Meeting (eds Knowles, D. A. et al.) 1–33 (PMLR, 2022).
Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021).
Zhou, Z. et al. DNABERT-2: efficient foundation model and benchmark for multi-species genomes. In Proc. International Conference on Learning Representations (OpenReview, 2024).
Mallet, V. & Vert, J.-P. Reverse-complement equivariant networks for DNA sequences. Adv. Neural Inf. Process. Syst. 34, 13511–13523 (2021).
Schiff, Y. et al. Caduceus: bi-directional equivariant long-range DNA sequence modeling. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) Vol. 235, 43632–43657 (PMLR, 2024).
Dalla-Torre, H. et al. Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2025).
Nguyen, E. et al. HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution. Adv. Neural Inf. Process. Syst. 36, 43177–43201 (2023).
Sanabria, M., Hirsch, J., Joubert, P. M. & Poetsch, A. R. DNA language model GROVER learns sequence context in the human genome. Nat. Mach. Intell. 6, 911–923 (2024).
Press, O., Smith, N. A. & Lewis, M. Train short, test long: attention with linear biases enables input length extrapolation. In Proc. International Conference on Learning Representations (OpenReview, 2022).
Dao, T. FlashAttention-2: faster attention with better parallelism and work partitioning. In Proc. International Conference on Learning
I’ll paraphrase the reference list entries to make them more readable while keeping the HTML structure intact:
(Curran Associates, 2024).
Lee, N. K., Tang, Z., Toneyan, S. & Koo, P. K. EvoAug: enhancing generalization and interpretability of genomic deep neural networks through evolution-inspired data augmentation techniques. Genome Biol. 24, 105 (2023).
Google Scholar
Ma, M. Maintaining reverse-complement consistency in DNA language models. Preprint at (2025).
Duan, Q. et al. JanusDNA: a robust bi-directional hybrid DNA foundation model. Adv. Neural Inf. Process. Syst. 38, 68791–68818 (2026).
Shrikumar, A., Greenside, P. & Kundaje, A. Leveraging reverse-complement parameter sharing to enhance deep learning models for genomics. Preprint at bioRxiv (2017).
Choi, C. H. et al. DNA actively guides its own transcription initiation process. Nucleic Acids Res 32, 1584–1590 (2004).
Google Scholar
Rohs, R. et al. How DNA shape influences protein–DNA recognition. Nature 461, 1248–1253 (2009).
Google Scholar
Kabir, A. et al. Integrating DNA breathing dynamics with deep learning foundational models to enhance genome-wide prediction of human transcription factor binding. Nucleic Acids Res. 52, e91 (2024).
Gordân, R. et al. Genomic regions adjacent to E-box binding sites affect DNA binding specificity of bHLH transcription factors via DNA shape. Cell Rep. 3, 1093–1104 (2013).
Google Scholar
Chen, Y. et al. Structural insights into p53 binding to the BAX response element: DNA unwinding and compression enable base-pair insertion. Nucleic Acids Res. 41, 8368–8376 (2013).
Google Scholar
Zhou, T. et al. Quantitative modeling of transcription factor binding specificities incorporating DNA shape features. Proc. Natl Acad. Sci. USA 112, 4654–4659 (2015).
Google Scholar
Mitra, R. et al. Applying geometric deep learning to predict protein–DNA binding specificity. Nat. Methods 21, 1674–1683 (2024).
Google Scholar
Adam, S., Klingel, V., Radde, N. E., Bashtrykov, P. & Jeltsch, A. Evaluating the accuracy of the epigenetic copy machine: a comprehensive specificity analysis of the DNMT1 DNA methyltransferase. Nucleic Acids Res. 51, 6622–6633 (2023).
Google Scholar
Bashtrykov, P. et al. The specificity of Dnmt1 for hemimethylated CpG site methylation is governed by its catalytic domain. Chem. Biol. 19, 572–578 (2012).
Google Scholar
Senadeera, D. C. et al. Dual-branch VideoMamba with gated class token fusion for detecting violent content. Preprint at (2025).
Yu, T., Cheng, L., Khalitov, R., Olsson, E. B. & Yang, Z. Self-distillation enhances self-supervised learning for DNA sequence inference tasks. Neural Netw. 183, 106978 (2025).
Google Scholar
Yang, H. et al. HAD: hybrid architecture distillation surpasses teacher performance in genomic sequence modeling. Preprint at (2025).
Hu, J. et al. Comba: enhancing nonlinear RNNs through closed-loop control mechanisms. Preprint at (2025).
Grešová, K., Martinek, V., Čechák, D., Šimeček, P. & Alexiou, P. Genomic Benchmarks: a curated collection of datasets for classifying genomic sequences. BMC Genom. Data 24, 25 (2023).
Zhou, J. & Troyanskaya, O. G. Predicting the effects of noncoding variants using a deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Google Scholar
Li, J., Pu, Y., Tang, J., Zou, Q. & Guo, F. DeepATT: a hybrid category attention neural network for determining functional effects of DNA sequences. Brief. Bioinform. 22, bbaa159 (2021).
Google Scholar
Li, Z. et al. An innovative, interpretable deep learning framework for designing synthetic enhancers with wide-ranging activity across species. Nucleic Acids Res. 52, 13447–13468 (2024).
Cheng, W. et al. DNALONGBENCH: a comprehensive benchmark suite for evaluating long-range DNA prediction tasks. Nat. Commun. 16, 10108 (2025).
Avsec, Ž. et al. Accurate prediction of gene expression from DNA sequence through integration of long-range interactions. Nat. Methods 18, 1196–1203 (2021).
de Almeida, B. P., Reiter, F., Pagani, M. & Stark, A. DeepSTARR: a tool for predicting enhancer activity from DNA sequence and enabling the creation of synthetic enhancers from scratch. Nat. Genet. 54, 613–624 (2022).
Gosai, S. J. et al. Machine-guided engineering of cis-regulatory elements targeting specific cell types. Nature 634, 1211–1220 (2024).
Fishman, V. et al. GENA-LM: a collection of open-source foundational DNA language models tailored for long sequences. Nucleic Acids Res. 53, gkae1310 (2025).
Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. A unified model of gene regulation that predicts RNA-seq coverage directly from DNA sequence. Nat. Genet. 57, 949–961 (2025).
Ahmed, F. S., Aly, S. & Liu, X. EPI-Trans: a powerful transformer-based deep learning model for predicting enhancer–promoter interactions. BMC Bioinform. 25, 216 (2024).
Feng, H. et al. Evaluating DNA foundation models across genomic and genetic tasks. Nat. Commun. 16, 10780 (2025).
Van Der Harst, P. & Verweij, N. Discovery of 64 new genetic loci broadens our understanding of the genetic basis of coronary artery disease. Circ. Res. 122, 433–443 (2018).
Medina, I. et al. Deficiency in Hck/Fgr kinases decreases plaque growth and stability by reducing monocyte recruitment and movement within plaques. Circulation 132, 490–501 (2015).
Bobryshev, Y. V. Monocyte recruitment and the formation of foam cells in atherosclerosis. Micron 37, 208–222 (2006).
Li, J. et al. New perspectives: dynamic foam cells originating from macrophages in atherosclerosis. J. Cell. Physiol. 236, 6154–6167 (2021).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors activate cis-regulatory elements essential for macrophage and B cell identity. Mol. Cell 38, 576–589 (2010).
Beltagy, I., Peters, M. E. & Cohan, A. Longformer: a transformer architecture designed for processing long documents. Preprint at (2020).
Schneider, V. A. et al. Assessment of GRCh38 and de novo haploid genome assemblies confirms the lasting reliability of the reference assembly. Genome Res. 27, 849–864 (2017).
Nguyen, E. et al. Modeling and designing sequences from molecular to genome scale using Evo. Science 386, eado9336 (2024).
Su, J. et al. Roformer: an improved transformer incorporating rotary position embeddings. Neurocomputing 568, 127063 (2024).
Mou, L. et al. Tree-based convolution and heuristic matching for natural language inference. In Proc. 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (eds Erk, K. et al.) 130–136 (Association for Computational Linguistics, 2016).
Upadhyaya, A., Nejdl, W. & Fisichella, M. Leveraging empathy and ethical considerations to detect relevance and categorize information in climate and COVID-19 tweets. In Proc. 33rd ACM International Conference on Information and Knowledge Management 4091–4095 (ACM, 2024).
Zhang, K. et al. EATN: a highly efficient adaptive transfer network for analyzing sentiment at the aspect level. IEEE Trans. Knowl. Data Eng. 35, 377–389 (2021).
Morales-Brotons, D., Vogels, T. & Hendrikx, H. Using exponential moving averages of weights in deep learning: behavior and advantages. Trans. Mach. Learn. Res. 2024, 1–27 (2024).
Yang, C. et al. CrossDNA: V1.1.0. Zenodo (2026).



