Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Google Scholar
Demetci, P., Santorella, R., Sandstede, B., Noble, W. S. & Singh, R. Scot: single-cell multi-omics alignment with optimal transport. J. Comput. Biol. 29, 3–18 (2022).
Google Scholar
van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729 (2018).
Lotfollahi, M., Wolf, F. A. & Theis, F. J. scgen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).
Google Scholar
Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).
Google Scholar
Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2021).
Google Scholar
Bommasani, R. et al. CoRR, abs/2108.07258. Preprint at (2021).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Google Scholar
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Google Scholar
Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) 1877–1901 (Curran Associates, 2020); https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Yang, F. et al. scbert as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).
Google Scholar
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Google Scholar
Cui, H. et al. scgpt: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Google Scholar
Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).
Google Scholar
Vaswani, A. et al. Attention is all you need. Preprint at (2017).
Bunne, C. et al. How to build the virtual cell with artificial intelligence: priorities and opportunities. Cell 187, 7045–7063 (2024).
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Chellappa, R. et al.) 10684–10695 (IEEE, 2022).
Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol. 26, 101 (2025).
Google Scholar
Boiarsky, R. et al. A more thorough assessment of a single-cell foundation model. Nat. Mach. Intell. 6, 1443–1446 (2024).
Google Scholar
Ahlmann-Eltze, C., Huber, W. & Anders, S. Predicting gene perturbation effects using deep learning still falls short of simple linear baseline methods. Nat. Methods 22, 1657–1661 (2025).
Google Scholar
Goyal, S., Maini, P., Lipton, Z. C., Raghunathan, A. & Kolter, J. Z. Scaling principles for data filtering—data curation must account for compute resources. In Proc. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 22702–22711 (IEEE, 2024).
Hoffmann, J. et al. Training large language models optimized for compute efficiency. In Advances in Neural Information Processing Systems 30016–30030 (NeurIPS, 2022).
Touvron, H. et al. Llama 2: publicly available foundation models and fine-tuned chat models. Preprint at (2023).
Richter, T., Bahrami, M., Xia, Y., Fischer, D. S. & Theis, F. J. Defining the optimal application of self-supervised learning in single-cell genomics. Nat. Mach. Intell. 7, 68–78 (2025).
Google Scholar
Heimberg, G. et al. A foundational cell atlas model enabling large-scale identification of similar human cells. Nature 638, 1085–1094 (2025).
Google Scholar
Fischer, F. et al. sctab: expanding cross-tissue single-cell annotation models to larger scales. Nat. Commun. 15, 6611 (2024).
Google Scholar
Hie, B., Cho, H., DeMeo, B., Bryson, B. & Berger, B. Geometric sketching provides a compact summary of the single-cell transcriptomic landscape. Cell Syst. 8, 483–493 (2019).
Shannon, C. E. A mathematical framework for communication theory. Bell Syst. Tech. J. 27, 379–423 (1948).
Google Scholar
Roswell, M., Dushoff, J. & Winfree, R. A practical framework for quantifying species diversity. Oikos 130, 321–338 (2021).
Google Scholar
Friedman, D. & Dieng, A. B. The Vendi Score: a metric for evaluating diversity in machine learning models. Trans. Mach. Learn. Res. (2023).
Heimlich, J. B. et al. Multiomic analysis of human clonal hematopoiesis uncovers genotype- and cell-specific activation of inflammatory pathways. Blood Adv. 8, 3665–3678 (2024).
Google Scholar
Moerkens, R. et al. An iPSC-based small intestine-on-chip featuring self-organizing epithelial, mesenchymal, and neural cell populations. Cell Rep. 43, 114247 (2024).
Google Scholar
Hoo, R. et al. Single-cell resolution analysis of the acute response to pathogens in the early human placenta. Cell Syst. 15, 425–444 (2024).
Easter, Q. T. et al. Single-cell and spatially resolved interactomics of keratinocytes associated with teeth in periodontitis. Nat. Commun. 15, 5016 (2024).
Google Scholar
Kim, N. et al. Single-cell RNA sequencing reveals the molecular and cellular reprogramming underlying metastatic lung adenocarcinoma. Nat. Commun. 11, 2285 (2020).
Google Scholar
Acera-Mateos, M. et al. Systematic benchmarking of single-cell multimodal data integration improves cell type resolution and uncovers clinically relevant states in complex tissues. Genome Biol. 27, 64 (2026).
Google Scholar
Edgar, R. D. et al. A single-cell map of the pediatric human liver uncovers gene expression patterns that evolve with age. Hepatol. Commun. 9, e0813 (2025).
Google Scholar
Luecken, M. D. et al. Evaluating atlas-scale data integration methods in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Google Scholar
Zhang, J. et al. Tahoe-100M: a massive single-cell perturbation atlas for modeling context-dependent gene function and cellular behavior. Preprint at bioRxiv (2025).
Replogle, J. M. et al. Charting information-dense genotype–phenotype landscapes using genome-wide Perturb-seq. Cell 185, 2559–2575 (2022).
Google Scholar
Montserrat-Ayuso, T. & Esteve-Codina, A. Abundant nuclei-free low-quality cells in reference single-cell atlases: a push for stricter quality control through nuclear fraction assessment. BMC Genomics 25, 1124 (2024).
Google Scholar
Nadig, A. et al. How training data composition shapes deep learning models in single-cell biology. Preprint at bioRxiv (2025).
Tejada-Lapuerta, A. et al. Nicheformer: a foundational model for single-cell and spatial omics data. Nat. Methods 22, 2525–2538 (2025).
Chen, Y. & Zou, J. A straightforward yet powerful embedding model for single-cell biology derived from ChatGPT. Nat. Biomed. Eng. 9, 483–493 (2025).
Google Scholar
Yang, X. et al. Genecompass: uncovering universal gene regulatory mechanisms using a knowledge-driven cross-species foundation model. Cell Res. 34, 830–845 (2024).
Google Scholar
Levine, D. et al. Cell2sentence: instructing large language models in the language of biology. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) (ICML, 2024).
Fu, X. et al. A foundational model of transcription spanning diverse human cell types. Nature 637, 965–973 (2025).
Google Scholar
Bian, H. et al. in Research in Computational Molecular Biology Vol. 14758 (ed. Ma, J.) 479–482 (Springer, 2024).
Quake, S. R. The cell as a container of RNA. Trends Genet. 37, 1064–1068 (2021).
Google Scholar
Townes, F. W., Hicks, S. C., Aryee, M. J. & Irizarry, R. A. Feature selection and dimensionality reduction for single-cell RNA-seq using a multinomial model framework. Genome Biol. 20, 1–16 (2019).
Google Scholar
Bouland, G. A., Mahfouz, A. & Reinders, M. J. T. Implications and opportunities stemming from sparser single-cell RNA-seq datasets. Genome Biol. 24, 86 (2023).
Google Scholar
di Montesano, S. C. et al. Enhancing atlas-scale single-cell annotation models through hierarchical cross-entropy loss. Nat. Comput. Sci. 6, 243–249 (2026).
Google Scholar
Kaplan, J. et al. Scaling laws governing neural language models. Preprint at (2020).
Hoffmann, J. et al. Training compute-optimal large language models. Preprint at (2022b).
Huang, K. et al. Sequential optimal experimental design for perturbation screens informed by multi-modal priors. Preprint at bioRxiv (2023).
Navidi, Z. et al. Adaptive resampling to boost machine learning performance on imbalanced single-cell datasets. Preprint at bioRxiv (2025).
Gowri, G., Yin, P. & Klein, A. M. Measurement noise scaling laws for learning cellular representations. Preprint at (2025).
Abdulla, S. et al. CELLxGENE Discover: a single-cell data platform designed for scalable exploration, analysis, and modeling of aggregated datasets. Nucleic Acids Res. 53, D886–D900 (2024).
Peidli, S. et al. scperturb: a collection of harmonized single-cell perturbation data. Nat. Methods 21, 531–540 (2024).
Google Scholar
Rand, W. M. Objective criteria for evaluating clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971).
Google Scholar
Simpson, E. H. Measuring diversity. Nature 163, 688–688 (1949).
Google Scholar
Roy, O. & Vetterli, M. The effective rank: a metric for effective dimensionality. In 2007 15th European Signal Processing Conference 606–610 (2007).
DenAdel, A. Microsoft/scFM data selection. Zenodo (2026).
Gayoso, A. et al. A Python library for the probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022).
Google Scholar



