Synthetic Data: Transforming Virtual Experiments Into Groundbreaking Biomedical Discoveries

Rajendran, S. et al. Cross-learning from varied biomedical data types and patient groups: obstacles and prospects for breakthroughs. Patterns 5, 100913 (2024).

Article
Google Scholar

Arora, A., Wagner, S. K., Carpenter, R., Jena, R. & Keane, P. A. The pressing need to fast-track synthetic data privacy standards for medical studies. Lancet Digit. Health 7, 157–160 (2025).

Article
Google Scholar

Alberto, I. R. I. et al. How commercial health datasets shape medical research and health-care algorithms. Lancet Digit. Health 5, 288–294 (2023).

Article
Google Scholar

Ghofrani, A. & Taherdoost, H. Leveraging biomedical data analytics to improve patient care. Drug Discov. Today 30, 104280 (2025).

Article
Google Scholar

Alderman, J. E. et al. Combating algorithmic bias and fostering openness in health datasets: the STANDING Together consensus guidelines. Lancet Digit. Health 7, 64–88 (2025).

Article
Google Scholar

Han, H. Reproducibility hurdles for AI in biomedical data science. BMC Med. Genomics 18, 8 (2025).

Article
Google Scholar

Luo, J. & Solit, D. B. Harnessing real-world data to drive biomarker discovery and precision oncology forward. Cancer Cell 43, 606–610 (2025).

Article
Google Scholar

Sriram, V. et al. Overcoming biomedical data hurdles and seizing opportunities to shape a comprehensive data lifecycle for better sharing, interoperability, analysis, and stakeholder collaboration. Sci. Rep. 15, 6291 (2025).

Article
Google Scholar

Sharma, A., Lysenko, A., Jia, S., Boroevich, K. A. & Tsunoda, T. Progress in AI and machine learning for predictive healthcare. J. Hum. Genet. 29, 487–497 (2024).

Article
Google Scholar

Ballard, J., Wang, Z., Li, W., Shen, L. & Long, Q. Deep learning methods for integrating and analyzing multi-omics data. BioData Min. 17, 38 (2024).

Article
Google Scholar

Breugel, B., Liu, T., Oglic, D. & Schaar, M. Generating synthetic biomedical data through generative artificial intelligence. Nat. Rev. Bioeng. 2, 991–1004 (2024).

Article
Google Scholar

Sandve, G. & Greiff, V. Unrestricted access to ground truth makes simulated data just as essential as experimental data for developing and benchmarking bioinformatics methods. Bioinformatics 38, 4994–4996 (2022).

Article
Google Scholar

Semmelrock, H. et al. Reproducibility in machine-learning research: a review of obstacles and enabling factors. AI Mag. 46, 70002 (2025).

Google Scholar

Mendes, J. M., Barbar, A. & Refaie, M. Synthetic data generation: a privacy-conscious strategy to speed up rare disease research. Front. Digit. Health 7, 1563991 (2025).

Article
Google Scholar

Wang, X. et al. Clonal expansion determines how effectively mitochondrial lineage tracing works in single cells. Genome Biol. 26, 70 (2025).

Article
Google Scholar

Yu, Y. et al. Adjusting batch effects in large-scale multiomics studies with a reference-material-based ratio approach. Genome Biol. 24, 201 (2023).

Article
Google Scholar

He, X. -h, Li, J. -r, Shen, S. -y & Xu, H. E. Comparing AlphaFold3 predictions to experimental structures: evaluating accuracy in ligand-bound G protein-coupled receptors. Acta Pharmacol. Sin. 46, 1111–1122 (2025).

Article
Google Scholar

Vall’ee, A. Imagining the future of personalized medicine: the role and practical realities of digital twins.

J. Med. Internet Res. 26, 50204 (2024).

Article Google Scholar

Li, X. et al. Digital twins as global public health and disease models for preventive and personalized medicine. Genome Med. 17, 11 (2025).

Article Google Scholar

Jirsa, V. et al. Personalized virtual brain models in epilepsy. Lancet Neurol. 22, 443–454 (2023).

Article Google Scholar

Niarakis, A. et al. Immune digital twins for complex human diseases: uses, limitations, and challenges. npj Syst. Biol. Appl. 10, 141 (2024).

Article Google Scholar

Hernandez, M. et al. A comprehensive evaluation framework for synthetic tabular health data: analyzing the accuracy, usefulness, and privacy of generative models with and without privacy protections. Front. Digit. Health 7, 1576290 (2025).

Article Google Scholar

Tian, Q., Zhang, P., Zhai, Y., Wang, Y. & Zou, Q. Using and comparing machine learning and database-based methods for classifying high-throughput sequencing data. Genome Biol. Evol. 16, 102 (2024).

Article Google Scholar

Mhanna, V. et al. Analyzing adaptive immune receptor repertoires. Nat. Rev. Methods Primers 4, 6 (2024).

Article Google Scholar

Suhre, K. et al. Nanoparticle enrichment mass-spectrometry proteomics identifies protein-altering variants for precise pQTL mapping. Nat. Commun. 15, 989 (2024).

Article Google Scholar

O’Donnell, T. J. et al. Reading the repertoire: advances in adaptive immune receptor analysis using machine learning. Cell Syst. 15, 1168–1189 (2024).

Article Google Scholar

Akbar, R. et al. Progress and challenges in designing fit-for-purpose monoclonal antibodies using machine learning. mAbs 14, 2008790 (2022).

Article Google Scholar

D’amico, S. et al. Using AI-generated synthetic data to speed up research and precision medicine in hematology. JCO Clin. Cancer Inform. 7, 2300021 (2023).

Article Google Scholar

Schneider, C., Buchanan, A., Tadde, B. & Deane, C. M. DLAB: deep learning approaches for structure-based virtual screening of antibodies. Bioinformatics 38, 377–383 (2022).

Article Google Scholar

Robert, P. A. et al. Generating unrestricted synthetic antibody–antigen structures to improve machine learning methods for predicting antibody specificity. Nat. Comput. Sci. 2, 845–865 (2022).

Article Google Scholar

Robert, P. A., Arulraj, T. & Meyer-Hermann, M. Ymir: a 3D structural binding affinity model for simulating multi-epitope vaccines. iScience 24, 102979 (2021).

Article Google Scholar

Monzó, C. et al. MOSim: a simulator for bulk and single-cell multilayer regulatory networks. Brief. Bioinform. 26, 110 (2025).

Article Google Scholar

Liu, J. et al. Achieving out-of-distribution generalization: a survey. Preprint at (2023).

Han, X., Zheng, H. & Zhou, M. CARD: classification and regression diffusion models. Adv. Neural Inf. Process. Syst. 35, 18100–18115 (2022).

Article Google Scholar

Zhou, Z. et al. Tackling data imbalance in sim2real: the Imbalsim2real approach and its use in finger joint stiffness self-sensing for soft robot-assisted rehabilitation. Front. Bioeng. Biotechnol. 12, 1334643 (2024).

Article Google Scholar

Chernigovskaya, M. et al. Simulating adaptive immune receptors and repertoires with complex immune information to guide

Curtiss, M. et al. Development and benchmarking of AIRR machine learning. Nucleic Acids Res. 53, gkaf025 (2025).

Article Google Scholar

Schuler, A., Jung, K., Tibshirani, R., Hastie, T. & Shah, N. Synth-validation: choosing the optimal causal inference approach for a specific dataset. Preprint at (2017).

Franklin, J. M., Schneeweiss, S., Polinski, J. M. & Rassen, J. A. Plasmode simulation for assessing pharmacoepidemiologic methods within complex healthcare databases. Comput. Stat. Data Anal. 72, 219–226 (2014).

Article MathSciNet Google Scholar

Quintana, D. S. A primer on synthetic datasets for the biobehavioural sciences to encourage reproducibility and hypothesis generation. eLife 9, 53275 (2020).

Article Google Scholar

Selvarajoo, K. & Maurer-Stroh, S. Advancing multi-omics synthetic data integration. Brief. Bioinform. 25, 213 (2024).

Article Google Scholar

Giuffrè, M. & Shung, D. L. Leveraging synthetic data in healthcare: innovation, application, and privacy considerations. npj Digit. Med. 6, 186 (2023).

Article Google Scholar

Heine, J., Fowler, E. E., Berglund, A., Schell, M. J. & Eschrich, S. Methods for producing and assessing realistic multivariate synthetic data. Sci. Rep. 13, 12266 (2023).

Article Google Scholar

Kühnel, L. et al. Creating synthetic data for a longitudinal cohort study—evaluation, method extension, and replication of published data analysis findings. Sci. Rep. 14, 14412 (2024).

Article Google Scholar

Pezoulas, V. C. et al. Approaches for generating synthetic data in healthcare: a review of open-source tools and techniques. Comput. Struct. Biotechnol. J. 23, 2892–2910 (2024).

Article Google Scholar

Rai, K., Wang, Y., O’Connell, R. W., Patel, A. B. & Bashor, C. J. Employing machine learning to improve and speed up synthetic biology. Curr. Opin. Biomed. Eng. 31, 100553 (2024).

Article Google Scholar

Hahn, W. et al. Producing dependable synthetic clinical trial data: the importance of hyperparameter tuning and domain-specific constraints. Inf. Sci. 733, 122927 (2026).

Article Google Scholar

Qian, Z. et al. Using synthetic data to preserve privacy in clinical risk prediction. Sci. Rep. 14, 25676 (2024).

Article Google Scholar

Nisevic, M., Milojevic, D. & Spajic, D. The use of synthetic data in medicine: legal and ethical issues in patient profiling. Comput. Struct. Biotechnol. J. 28, 190–198 (2025).

Article Google Scholar

Schulz, A. et al. Modeling conditional distributions of neural and behavioral data using masked variational autoencoders. Cell Rep. (2025).

Kim, Y. & Cheon, M. in Transcriptome Data Analysis (ed. Azad, R. K.) 259–274 (Springer, 2024); https://doi.org/10.1007/978-1-0716-3886-6_15

Watson, J. L. et al. Designing protein structure and function from scratch with RFdiffusion. Nature 620, 1089–1100 (2023).

Article Google Scholar

Dalla-Torre, H. et al. Nucleotide transformer: constructing and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2025).

Article Google Scholar

Zhang, S. et al. Uses of transformer-based language models in bioinformatics: a comprehensive survey. Bioinform. Adv. 3, 001 (2023).

Google Scholar

Vu, M. H. et al. A linguistically inspired guide for developing biologically reliable protein language models. Nat. Mach. Intell. 5, 485–496 (2023).

Article Google Scholar

Garbulowski, M. et al. GeneSPIDER2: simulating and benchmarking large-scale gene regulatory networks using perturbed single-cell data. NAR Genomics Bioinform. 6, 121 (2024).

Article Google Scholar

Cunningham, H., Ewart, A., Riggs, L., Huben, R. & Sharkey, L. Sparse autoencoders identify highly interpretable features within language models. ICLR (2024).

Gujral, O., Bafna, M., Alm, E. & Berger, B. Sparse autoencoders reveal biologically meaningful features in protein language model representations. Proc. Natl Acad. Sci. USA 122, 2506316122 (2025).

Article Google Scholar

Zhang, Y., Tiňo, P., Leonardis, A. & Tang, K. A review of neural network interpretability methods. IEEE Trans. Emerg. Top. Comput. Intell. 5, 726–742 (2021).

Article Google Scholar

Lewis, S. et al. Scalable generation of protein equilibrium ensembles using generative deep learning. Science 389, eadv9817 (2025).

Article Google Scholar

Wasdin, P. T. et al. Designing antigen-specific paired-chain antibodies with large language models. Cell 188, 7206–7221.e16 (2025).

Article Google Scholar

Mason, D. M. et al. Improving therapeutic antibodies by predicting antigen specificity from antibody sequences using deep learning. Nat. Biomed. Eng. 5, 600–612 (2021).

Article Google Scholar

Scuoppo, R. et al. Building a virtual cohort of TAVI patients for in silico trials using statistical shape analysis and machine learning. Med. Biol. Eng. Comput. 63, 467–482 (2025).

Article Google Scholar

Wang, H., Arulraj, T., Ippolito, A. & Popel, A. S. Moving from virtual patients to digital twins in immuno-oncology: insights from mechanistic quantitative systems pharmacology modeling. npj Digit. Med. 7, 189 (2024).

Article Google Scholar

Vilhekar, R. S. & Rawekar, A. Applications of artificial intelligence in genetics. Cureus 16, e52035 (2024).

Google Scholar

Yelmen, B. et al. Generating artificial human genomes with generative neural networks. PLoS Genet. 17, 1009303 (2021).

Article Google Scholar

Chen, J., Mowlaei, M. E. & Shi, X. Augmenting population-scale genomic data using conditional generative adversarial networks. In Proc. 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (Association for Computing Machinery, 2020).

Rehfeldt, T. G. et al. ProteomicsML: a community-driven online platform for curated datasets and machine learning tutorials in proteomics. J. Proteome Res. 22, 632–636 (2023).

Article Google Scholar

Akbar, R. et al. In silico demonstration of machine learning–based antibody design at unconstrained scale. mAbs 14, 2031482 (2022).

Article Google Scholar

Gallo, E. The impact of big data: deep sequencing–powered computational approaches are reshaping synthetic antibody design. J. Biomed. Sci. 31, 29 (2024).

Article Google Scholar

Konstantinovsky, T., Peres, A., Polak, P. & Yaari, G. A fair comparison of immunoglobulin sequence alignment tools. Brief. Bioinform. 25, 556 (2024).

Article Google Scholar

Santangelo, G., Nicora, G., Bellazzi, R. & Dagliati, A. Evaluating synthetic data quality: SynthRO, a dashboard for assessing and benchmarking synthetic tabular data. BMC Med. Inform. Decis. Mak. 25, 89 (2025).

Article Google Scholar

De Nicoló, V. et al. Synthetic data generation in cancer genomics: a review of global research trends over the past decade. Discov. Artif. Intell. 5, 148 (2025).

Article Google Scholar

Peng, K. et al. Worth and difficulty in immunogenomic diversity. Nat. Methods 18, 588–591 (2021).

Article

Google Scholar

Tavakkol, S. et al. Less is better: adjusting coverage levels for synthetically-generated training datasets. Preprint at (2025).

Avellí, M. P., Medina, R. H., Webel, H. & Rasmussen, S. Exploring variational autoencoders for merging biomedical datasets. Preprint at bioRxiv (2025).

Zhang, J., Che, Y., Liu, R., Wang, Z. & Liu, W. Leveraging deep learning for multi-omics research: improving cancer detection and treatment strategies. Brief. Bioinform. 26, 440 (2025).

Article

Google Scholar

Omeranovic, A., Van Long, F. N., Boubaker, A., Turgeon, A. & Nabi, H. Gaps in genomic research representation: insights from researcher interviews. BMC Med. Genomics 18, 72 (2025).

Article

Google Scholar

Böttcher, L., Wald, S. & Chou, T. A mathematical framework for distinguishing private from public immune receptor sequences. Bull. Math. Biol. 85, 102 (2023).

Article
MathSciNet

Google Scholar

Baião, A. R. et al. A comprehensive overview of multi-omics integration techniques: spanning classical statistics to modern deep generative models. Brief. Bioinform. 26, 355 (2025).

Article

Google Scholar

Palsson, G. et al. Comprehensive mapping of human recombination landscapes. Nature 639, 700–707 (2025).

Article

Google Scholar

Santiago, E., Köpke, C. & Caballero, A. Incorporating population structure and data quality into demographic analyses using linkage disequilibrium approaches. Nat. Commun. 16, 6054 (2025).

Article

Google Scholar

Schloissnig, S. et al. Mapping structural genomic variation across 1,019 individuals using long-read sequencing technologies. Nature 644, 442–452 (2025).

Article

Google Scholar

Cui, X. et al. Moving past fixed structures: capturing protein motion and conformational dynamics after AlphaFold. Brief. Bioinform. 26, 340 (2025).

Article

Google Scholar

Shrestha, P., Kandel, J., Tayara, H. & Chong, K. T. Predicting post-translational modifications through prompt-based tuning of a GPT-2 architecture. Nat. Commun. 15, 6699 (2024).

Article

Google Scholar

Draizen, E. J., Veretnik, S., Mura, C. & Bourne, P. E. Generative deep models reveal remote protein relationships through continuous fold space exploration. Nat. Commun. 15, 8094 (2024).

Article

Google Scholar

Listov, D., Goverde, C. A., Correia, B. E. & Fleishman, S. J. Prospects and hurdles in engineering and optimizing protein functionality. Nat. Rev. Mol. Cell Biol. 25, 639–653 (2024).

Article

Google Scholar

Latorre, D., Monticelli, S., Wypych, T. P., Aschenbrenner, D. & Notarbartolo, S. T cell specificity and cross-reactivity: relevance to both healthy and diseased states. Front. Immunol. 15, 1385415 (2024).

Article

Google Scholar

Gray, G. I. et al. Rethinking T cell receptor recognition: the underlying rules are more suggestive than strict. Immunol. Rev. 329, 13439 (2025).

Article

Google Scholar

Boraschi, D., Schaar, M., Costa, A. & Milne, R. Regulating synthetic data in medical research: an urgent priority. Lancet Digit. Health 7, 233–234 (2025).

Article

Google Scholar

Jiang, J., Domingues, L. & Mendes, J. M. Synthetic data in medical imaging under the EHDS framework: charting a way forward for ethical governance, regulation, and standardization. Front. Digit. Health 7, 1620270 (2025).

Article

Google Scholar

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. Training GANs with a two time-scale update rule leads to convergence at a local Nash equilibrium. In Proc. 31st International Conference on Neural Information Processing Systems 6629–6640 (Curran Associates, 2017).

Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J. & Aila, T. An enhanced precision and recall metric for evaluating generative models. In Proc. 33rd International Conference on Neural Information Processing Systems 3927–3936 (Curran Associates, 2019).

Sarkar, A. et al. Engineering DNA sequences with adjustable regulatory functions using discrete diffusion. Preprint at bioRxiv (2026).

Juan, L. et al. Assessing individual genome similarity using a topic model approach. Bioinformatics 36, 4757–4764 (2020).

Article
Google Scholar

Tunyasuvunakool, K. et al. Accurate prediction of protein structures across the human proteome. Nature 596, 590–596 (2021).

Article
Google Scholar

Alamdari, S. et al. Protein generation via evolutionary diffusion: leveraging sequence information alone. Preprint at bioRxiv (2024).

Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S. & Klambauer, G. Fréchet ChemNet distance: a benchmarking metric for generative molecular models in drug discovery. J. Chem. Inf. Model. 58, 1736–1741 (2018).

Article
Google Scholar

Seo, K. & Choi, J. K. A thorough analysis of TCR and BCR repertoires: methods, challenges, and applications. Genomics Inform. 23, 6 (2025).

Article
Google Scholar

Mika, J. et al. A systematic evaluation of diversity metrics for TCR repertoire analysis. BMC Biol. 23, 133 (2025).

Article
Google Scholar

Corcoran, M. M. & Hedestam, G. B. K. Variation in germline genes of adaptive immune receptors. Curr. Opin. Immunol. 87, 102429 (2024).

Article
Google Scholar

Zamzmi, G. et al. A scoring framework for assessing synthetic medical data. Commun. Eng. 4, 130 (2025).

Article
Google Scholar

Huang, J. et al. Foundation models and intelligent decision-making: advancements, open issues, and future directions. Innovation (2025).

Hu, Y., Xu, C., Lin, B., Yang, W. & Tang, Y. Y. Multimodal large language models in medicine: a comprehensive review. Intell. Oncol. 1, 308–325 (2025).

Article
Google Scholar

Arora, R. K. et al. HealthBench: benchmarking large language models for better human health outcomes. Preprint at (2025).

Gaebe, K. & Woerd, B. Testing large language models as diagnostic aids for medical students and clinicians with advanced prompting methods. PLoS ONE 20, 0325803 (2025).

Article
Google Scholar

Phillips, L. et al. SynthPert: boosting LLM biological reasoning through synthetic traces for predicting cellular perturbations. Preprint at (2025).

Bolgova, O., Ganguly, P., Ikram, M. F. & Mavrych, V. Using large language models to grade medical short answer questions: a comparison with expert human graders. Med. Educ. Online 30, 2550751 (2025).

Article
Google Scholar

Höfer, S. et al. Sim2real in robotics and automation: use cases and challenges. IEEE Trans. Autom. Sci. Eng. 18, 398–400 (2021).

Article
Google Scholar

Truong, J., Chernova, S. & Batra, D. Two-way domain adaptation for sim2real transfer in embodied navigation systems. IEEE Robot. Autom. Lett. 6, 2634–2641 (2021).

Article
Google Scholar

Baek, S., Song, K. & Lee, I. Single-cell foundation models: integrating artificial intelligence into cellular biology. Exp. Mol. Med. 27, 2169–2181 (2025).

Article Google Scholar

Sethna, Z., Elhanati, Y., Callan, C. G., Walczak, A. M. & Mora, T. OLGA: rapid calculation of generation probabilities for B-cell and T-cell receptor amino acid sequences and motifs. Bioinformatics 35, 2974–2981 (2019).

Article Google Scholar

Ahlmann-Eltze, C., Huber, W. & Anders, S. Deep learning approaches for predicting gene perturbation effects have not yet surpassed simple linear baseline methods. Nat. Methods 22, 1657–1661 (2025).

Article Google Scholar

Fischer, S. C., Bassel, G. W. & Kollmannsberger, P. Tissues as networks of cells: toward generative rules governing complex organ development. J. R. Soc. Interface 20, 20230115 (2023).

Article Google Scholar

Karim, A. S. et al. Deconstructing synthetic biology across multiple scales: a conceptual framework for training synthetic biologists. Nat. Commun. 15, 5425 (2024).

Article Google Scholar

Pang, K. et al. PULSAR: a foundational model for multi-scale and multicellular biology. Preprint at bioRxiv (2025).

Lv, T., Zhang, Y., Liu, J., Kang, Q. & Liu, L. Multi-omics integration for both single-cell and spatially resolved data using a dual-path graph attention auto-encoder. Brief. Bioinform. 25, 450 (2024).

Article Google Scholar

Wu, Y. & Xie, L. AI-powered multi-omics integration for multi-scale predictive modeling of genotype–environment–phenotype relationships. Comput. Struct. Biotechnol. J. 27, 265–277 (2025).

Article Google Scholar

Schuster, V., Dann, E., Krogh, A. & Teichmann, S. A. multiDGD: a flexible deep generative model for multi-omics data. Nat. Commun. 15, 10031 (2024).

Article Google Scholar

Xin, L. et al. Artificial intelligence for central dogma-centric multi-omics: challenges and breakthroughs. Preprint (2024).

Li, Y., Wang, Y., Liang, T., Li, Y. & Du, W. A multi-omics integration framework employing multi-label guided learning and multi-scale fusion. Brief. Bioinform. 26, 493 (2025).

Article Google Scholar

Arulraj, T., Wang, H. et al. Harnessing multi-omics data to strengthen quantitative systems pharmacology in immuno-oncology. Brief. Bioinform. 25, 131 (2024).

Article Google Scholar

Liu, J. et al. Challenges in AI-driven biomedical multimodal data fusion and analysis. Genomics Proteomics Bioinform. 23, 011 (2025).

Article Google Scholar

Marchesi, R. et al. Coherent cross-modal generation of synthetic biomedical data to advance multimodal precision medicine. PLoS Comput. Biol. (2026).

Warner, E. et al. Multimodal machine learning in image-based and clinical biomedicine: a survey and future outlook. Int. J. Comput. Vision 132, 3753–3769 (2024).

Article Google Scholar

Sheng, B., Keane, P. A., Tham, Y.-C. & Wong, T. Y. Synthetic data enhances medical foundation models. Nat. Biomed. Eng. 9, 443–444 (2025).

Article Google Scholar

Pedrocchi, F., Barkmann, F., Joudaki, A. & Boeva, V. Sparse autoencoders uncover interpretable features in single-cell foundation models. Preprint at bioRxiv (2025).

Le, N. M. et al. Assessing the utility of sparse autoencoders for interpreting a pathology foundation model. Preprint (2025).

Tagasovska, N. et al. Antibody DomainBed: out-of-distribution generalization in therapeutic protein design. Preprint (2024).

Vegesana, K. & Thomas, P. G. Deciphering the code of adaptive immunity: the role of computational tools. Cell Syst. 15, 1156–1167 (2024).

Article Google Scholar

Cohen-Davidi, E. & Veksler-Lublinsky, I. Evaluating negative instances: how generating negative data influences the classification of interactions between microRNAs and messenger RNAs. PLoS Comput. Biol. 20, 1012385 (2024).

Article Google Scholar

Klarner, L., Rudner, T, G. J., Morris, G. M., Deane, C. M. & Teh, Y. W. Guiding diffusion with context for designing molecules and proteins outside the training distribution. In Proc. 41st International Conference on Machine Learning 24770–24807 (JMLR, 2024).

Ursu, E. et al. How training data composition shapes machine learning generalization and the discovery of biological principles. Nat. Mach. Intell. 7, 1206–1219 (2025).

Article Google Scholar

Vanherle, B., Michiels, N. & Reeth, F. V. Evolutionary learning approaches for crafting sim-to-real data enhancement strategies. Preprint at (2024).

Li, W., Ballard, J., Zhao, Y. & Long, Q. Integrating biological knowledge into learning methods for multi-omics data analysis. Comput. Struct. Biotechnol. J. 23, 1945–1950 (2024).

Article Google Scholar

Shi, Y., Xu, W. & Hu, P. Learning beyond the training distribution in bioinformatics: recent advances and open challenges. Brief. Bioinform. 26, 294 (2025).

Article Google Scholar

Liu, Y. et al. An intelligent bioinformatics assistant powered by data for large-scale omics studies and scientific discovery. Brief. Bioinform. 26, 312 (2025).

Article Google Scholar

Tobin, J. et al. Leveraging domain randomization to adapt deep neural networks from simulated environments to real-world settings. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 23–30 (IEEE Press, 2017).

Saratkar, S. Y. et al. Digital twin technology for advancing personalized medicine. Front. Digit. Health (2025).

De Domenico, M. et al. Prospects and hurdles for digital twins in precision medicine through the lens of complex systems. npj Digit. Med. 8, 37 (2025).

Article Google Scholar

Puniya, B. L. AI-powered advances in mechanistic computational modeling and digital twin technology for biomedical use cases. J. Mol. Biol. 437, 169181 (2025).

Article Google Scholar

Thornburg, Z. R. et al. Computationally reconstructing a minimal living cell in four dimensions. Cell (2026).

Hansen, J., Jain, A. R., Nenov, P., Robinson, P. N. & Iyengar, R. Bridging transcriptomics and digital twins to model organ-level function. Front. Cell Dev. Biol. (2024).

Niarakis, A. et al. Constructing immune digital twins: a collaborative, multidisciplinary, international initiative. Immunoinformatics (2025).

Surian, N. U. et al. A digital twin approach using generalized metabolic fluxes to detect and forecast chronic kidney disease in type 2 diabetes. npj Digit. Med. 7, 140 (2024).

Article Google Scholar

Kapteyn, M. G. et al. TumorTwin: a Python-based toolkit for building patient-specific digital twins in cancer research. BMC Med. Inform. Decis. Mak. (2026).

Frey, N. C. et al. Combining laboratory experimentation with deep learning for therapeutic antibody optimization. Preprint at bioRxiv (2025).

Khan, A. A. et al. A strategic roadmap toward predictive human immunology. Preprint at (2025).

Morlot, J.-B. et al. TwinCell: a large-scale causal cellular model enabling robust and interpretable identification of therapeutic targets. Preprint at bioRxiv (2026).

Bruynseels, K., Sio, F. & Hoven, J. Exploring the ethical dimensions of digital twins as an emerging paradigm in healthcare engineering. Front. Genet. 9, 31 (2018).

Article Google Scholar

Björnsson, B. et al. Leveraging digital twins to tailor medical treatments to individual patients. Genome Med. 12, 1–4 (2020).

Article Google Scholar

I notice that the content you’ve provided consists entirely of a reference list (bibliography) from an academic article. These are citations — author names, titles, journal names, years, and page numbers — which are factual bibliographic data that cannot be meaningfully paraphrased without losing their accuracy and utility.

Paraphrasing citation entries would be inappropriate because:

1. **Author names** are proper nouns and cannot be changed.
2. **Article titles** are fixed identifiers — altering them would make the references incorrect.
3. **Journal names, volumes, pages, and DOIs** are precise factual data that must remain unchanged for the references to function.

Since there is no prose or narrative text to rewrite, there is nothing here to paraphrase. The HTML structure and content should be kept exactly as-is.

If you have the **body text** of the article (the paragraphs discussing these references), I’d be happy to paraphrase that for you while keeping the reference list intact. Please share the article text and I’ll get started!

Top Posts

Anthropic Export Controls Spark Global AI Sovereignty Scramble

Mathematical String Probability: A Human-Powered Solution to the 3Blue1Brown Challenge

Reve 2.0 Review: The Best AI Image Generator for Layout Control

Synthetic Data: Transforming Virtual Experiments into Groundbreaking Biomedical Discoveries

Mathematical String Probability: A Human-Powered Solution to the 3Blue1Brown Challenge

OWL’s Guide: 3D Spleen Segmentation with MONAI UNet on CT Volumes

Vision LLMs Double as Powerful PDF Decoders: Making Charts and Diagrams Retrievable for Smarter RAG Systems

Zyphra Unveils Zamba2-VL: A Hybrid Mamba2–Transformer Vision-Language Model Slashing Time-to-First-Token by Nearly 10x

Parse PDFs Locally for RAG Using Docling: Extract Rich Tables Without Cloud Upload

Decoding Schizophrenia: How Saliency Maps Illuminate 3D MRI Decision Pathways

Anthropic Export Controls Spark Global AI Sovereignty Scramble

Mathematical String Probability: A Human-Powered Solution to the 3Blue1Brown Challenge

Reve 2.0 Review: The Best AI Image Generator for Layout Control

Army Data Center Initiatives Face Potential Setback Under House NDAA Clause

I tested dozens of Bluetooth trackers, but this one shocked me with its AirTag-crushing battery life

GPU Time-Slicing for Concurrent LLM Agents on Kubernetes

OWL’s Guide: 3D Spleen Segmentation with MONAI UNet on CT Volumes

Voices from Within: Reshaping Medicaid’s Future

Trending

Anthropic Export Controls Spark Global AI Sovereignty Scramble

Mathematical String Probability: A Human-Powered Solution to the 3Blue1Brown Challenge

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Synthetic Data: Transforming Virtual Experiments into Groundbreaking Biomedical Discoveries

Related Posts