Jiang, L. Y. et al. Well being system-scale language fashions are all-purpose prediction engines. Nature 619, 357–362 (2023).
Google Scholar
Singhal, Okay. et al. Giant language fashions encode scientific information. Nature 620, 172–180 (2023).
Google Scholar
Chen, R. J. et al. In the direction of a general-purpose basis mannequin for computational pathology. Nat. Med. 30, 850–862 (2024).
Google Scholar
Habicht, J. et al. Closing the accessibility hole to psychological well being therapy with a customized self-referral chatbot. Nat. Med. 30, 595–602 (2024).
Google Scholar
Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).
Google Scholar
Wan, P. et al. Outpatient reception by way of collaboration between nurses and a big language mannequin: a randomized managed trial. Nat. Med. 30, 2878–2885 (2024).
Google Scholar
Huang, Okay. et al. A basis mannequin for clinician-centered drug repurposing. Nat. Med. 30, 3601–3613 (2024).
Google Scholar
Li, J. et al. Built-in image-based deep studying and language fashions for main diabetes care. Nat. Med. 30, 2886–2896 (2024).
Google Scholar
Liu, X. et al. A generalist medical language mannequin for illness analysis help. Nat. Med. 31, 932–942 (2025).
Google Scholar
Van Veen, D. et al. Tailored massive language fashions can outperform medical specialists in scientific textual content summarization. Nat. Med. 30, 1134–1142 (2024).
Google Scholar
Johri, S. et al. An analysis framework for scientific use of enormous language fashions in affected person interplay duties. Nat. Med. 31, 77–86 (2025).
Google Scholar
Ao, G. et al. Comparative evaluation of enormous language fashions on uncommon illness identification. Orphanet J. Uncommon Dis. 20, 150 (2025).
Google Scholar
Shyr, C. Giant language fashions for uncommon illness analysis on the undiagnosed ailments community. JAMA Netw. Open 8, e2528538 (2025).
Google Scholar
Weiner, S. J. & Schwartz, A. Listening for What Issues: Avoiding Contextual Errors in Well being Care (Oxford Univ. Press, 2023).
Yu, Okay. -H. & Kohane, I. S. Framing the challenges of synthetic intelligence in drugs. BMJ Qual. Saf. 28, 238–241 (2019).
Google Scholar
Zhang, S., Liu, Q., Qin, G., Naumann, T. & Poon, H. Med-RLVR: rising medical reasoning from a 3B base mannequin by way of reinforcement studying. Preprint at (2025).
Hager, P. et al. Analysis and mitigation of the constraints of enormous language fashions in scientific decision-making. Nat. Med. 30, 2613–2622 (2024).
Google Scholar
McDermott, M. B. A., Yap, B., Szolovits, P. & Zitnik, M. Construction-inducing pre-training. Nat. Mach. Intell. 5, 612–621 (2023).
Google Scholar
Guo, L. L. et al. A multi-center research on the adaptability of a shared basis mannequin for digital well being data. npj Digit. Med. 7, 171 (2024).
Google Scholar
Wornow, M. et al. The shaky foundations of enormous language fashions and basis fashions for digital well being data. npj Digit. Med. 6, 135 (2023).
Pais, C. et al. Giant language fashions for stopping treatment route errors in on-line pharmacies. Nat. Med. 30, 1574–1582 (2024).
Google Scholar
Sabuncu, M. R., Wang, A. Q. & Nguyen, M. Moral use of synthetic intelligence in medical diagnostics calls for a deal with accuracy, not equity. NEJM AI 2, AIp2400672 (2024).
Li, M. M. et al. Contextual AI fashions for single-cell protein biology. Nat. Strategies 21, 1546–1557 (2024).
Google Scholar
Kather, J. N., Ferber, D., Wiest, I. C., Gilbert, S. & Truhn, D. Giant language fashions might make pure language once more the common interface of healthcare. Nat. Med. 30, 2708–2710 (2024).
Google Scholar
Brown, T. et al. Language fashions are few-shot learners. Adv. Neural Inf. Proc. Syst. 33, 1877–1901 (2020).
Liu, H. et al. Few-shot parameter-efficient fine-tuning is healthier and cheaper than in-context studying. Adv. Neural Inf. Proc. Syst. 35, 1950–1965 (2022).
Pan, J., Gao, T., Chen, H. & Chen, D. What in-context studying ‘learns’ in-context: disentangling activity recognition and activity studying. In Findings of the Affiliation for Computational Linguistics 8298–8319 (ACL, 2023).
Min, S. et al. Rethinking the position of demonstrations: what makes in-context studying work? In Proc. 2022 Convention on Empirical Strategies in Pure Language Processing 11048–11064 (ACL, 2022).
Chen, B., Zhang, Z., Langrené, N. & Zhu, S. Unleashing the potential of immediate engineering for big language fashions. Patterns 6, 101260 (2025).
Google Scholar
Shen, S. et al. Multitask vision-language immediate tuning. In Proc. the IEEE/CVF Winter Convention on Purposes of Laptop Imaginative and prescient 5656–5667 (IEEE, 2024).
Wang, W. et al. VisionLLM: massive language mannequin can be an open-ended decoder for vision-centric duties. Adv. Neural Inf. Proc. Syst. 36, 61501–61513 (2023).
Tanwani, A. Okay., Barral, J. & Freedman, D. RepsNet: combining imaginative and prescient with language for automated medical stories. In Worldwide Convention on Medical Picture Computing and Laptop-assisted Intervention 714–724 (Springer, 2022).
Shentu, J. & Al Moubayed, N. CXR-IRGen: an built-in imaginative and prescient and language mannequin for the era of clinically correct chest X-ray image-report pairs. In Proc. IEEE/CVF Winter Convention on Purposes of Laptop Imaginative and prescient (IEEE, 2024).
Wu, S. et al. CollabLLM: from passive responders to energetic collaborators. In Proc. forty second Worldwide Convention on Machine Studying (PMLR, 2025).
Alsentzer, E. et al. Few shot studying for phenotype-driven analysis of sufferers with uncommon genetic ailments. npj Digit. Med. 8, 380 (2025).
Goh, E. et al. Giant language mannequin affect on diagnostic reasoning: a randomized scientific trial. JAMA Netw. Open 7, e2440969 (2024).
Google Scholar
Wang, L. et al. Immediate engineering in consistency and reliability with the evidence-based guideline for LLMs. npj Digit. Med. 7, 41 (2024).
Khattab, O. et al. DSPy: compiling declarative language mannequin calls into state-of-the-art pipelines. In Worldwide Convention on Studying Representations (ICLR, 2024).
Yuksekgonul, M. et al. Optimizing generative AI by backpropagating language mannequin suggestions. Nature 639, 609–616 (2025).
Google Scholar
Vaziri, M., Mandel, L., Spiess, C. & Hirzel, M. PDL: a declarative immediate programming language. Preprint at (2024).
Lu, Y. et al. In the direction of doctor-like reasoning: Medical RAG fusing information with affected person analogy by way of textual gradients. In thirty ninth Convention on Neural Data Processing Techniques (NeurIPS, 2025).
Maharjan, J. et al. OpenMedLM: immediate engineering can out-perform fine-tuning in medical question-answering with open-source massive language fashions. Sci. Rep. 14, 14156 (2024).
Google Scholar
Nori, H. et al. Can generalist basis fashions outcompete special-purpose tuning? Case research in drugs. Preprint at (2023).
Wu, S., Koo, M., Scalzo, F. & Kurtz, I. AutoMedPrompt: a brand new framework for optimizing LLM medical prompts utilizing textual gradients. Preprint at (2025).
Yu, F. et al. Heterogeneity and predictors of the consequences of AI help on radiologists. Nat. Med. 30, 837–849 (2024).
Google Scholar
Rrv, A., Tyagi, N., Uddin, M. N., Varshney, N. & Baral, C. Chaos with key phrases: exposing massive language fashions sycophancy to deceptive key phrases and evaluating protection methods. In Findings of the Affiliation for Computational Linguistics 12717–12733 (ACL, 2024).
Fanous, A. et al. SycEval: evaluating LLM sycophancy. In Proc. AAAI/ACM Convention on AI, Ethics, and Society 8, 893–900 (ACM, 2025).
Su, X. et al. KGARevion: an AI agent for knowledge-intensive biomedical QA. In Worldwide Convention on Studying Representations (ICLR, 2025).
Zhang, G. et al. Leveraging lengthy context in retrieval augmented language fashions for medical query answering. npj Digit. Med. 8, 239 (2025).
Ke, Y. H. et al. Retrieval augmented era for 10 massive language fashions and its generalizability in assessing medical health. npj Digit. Med. 8, 187 (2025).
Kresevic, S. et al. Optimization of hepatological scientific pointers interpretation by massive language fashions: a retrieval augmented generation-based framework. npj Digit. Med. 7, 102 (2024).
Lopez, I. et al. Medical entity augmented retrieval for scientific data extraction. npj Digit. Med. 8, 45 (2025).
Asai, A., Wu, Z., Wang, Y., Sil, A. & Hajishirzi, H. Self-RAG: studying to retrieve, generate, and critique by way of self-reflection. In Worldwide Convention on Studying Representations (ICLR, 2024).
Yang, D., Zeng, L., Rao, J. & Zhang, Y. Realizing you don’t know: studying when to proceed search in multi-round RAG by way of self-practicing. In Proc. forty eighth Worldwide ACM SIGIR Convention on Analysis and Growth in Data Retrieval 1305–1315 (ACM, 2025).
Islam, S. B. et al. Open-RAG: enhanced retrieval augmented reasoning with open-source massive language fashions. In Findings of the Affiliation for Computational Linguistics 14231–14244 (ACL, 2024).
Jeong, S., Baek, J., Cho, S., Hwang, S. J. & Park, J. C. Adaptive-RAG: studying to adapt retrieval-augmented massive language fashions by way of query complexity. In Proc. 2024 Convention of the North American Chapter of the Affiliation for Computational Linguistics: Human Language Applied sciences (Vol. 1: Lengthy Papers) 7036–7050 (ACL, 2024).
Yang, R. et al. Retrieval-augmented era for generative synthetic intelligence in well being care. npj Well being Syst. 2, 2 (2025).
Anisuzzaman, D. M., Malins, J. G., Friedman, P. A. & Attia, Z. I. High quality-tuning massive language fashions for specialised use instances. Mayo Clin. Proc. Digit. Well being 3, 100184 (2025).
Google Scholar
Wiest, I. C. et al. Deidentifying medical paperwork with native, privacy-preserving massive language fashions: the LLM-anonymizer. NEJM AI 2, 4 (2025).
Croskerry, P. A common mannequin of diagnostic reasoning. Acad. Med. 84, 1022–1028 (2009).
Geiping, J. et al. Scaling up test-time compute with latent reasoning: a recurrent depth method. In thirty ninth Annual Convention on Neural Data Processing Techniques (NeurIPS, 2025).
Makarov, N. et al. Giant language fashions forecast affected person well being trajectories enabling digital twins. npj Digit. Med. 8, 588 (2025).
Renc, P. et al. Zero shot well being trajectory prediction utilizing transformer. npj Digit. Med. 7, 256 (2024).
Wang, J. et al. Self-improving generative basis mannequin for artificial medical picture era and scientific purposes. Nat. Med. 31, 609–617 (2024).
Google Scholar
Rao, V. M. et al. Multimodal generative AI for medical picture interpretation. Nature 639, 888–896 (2025).
Google Scholar
Duan, Y., Xu, C., Pei, J., Han, J. & Li, C. Pre-train and plug-in: versatile conditional textual content era with variational auto-encoders. In Proc. 58th Annual Assembly of the Affiliation for Computational Linguistics 253–262 (ACL, 2020).
Epstein, D., Jabri, A., Poole, B., Efros, A. & Holynski, A. Diffusion self-guidance for controllable picture era. Adv. Neural Inf. Proc. Syst. 36, 16222–16239 (2023).
Li, Z. et al. ControlAR: controllable picture era with autoregressive fashions. In thirteenth Worldwide Convention on Studying Representations (ICLR, 2025).
Beattie, J. et al. Utilizing massive language fashions to create affected person centered consent kinds. Int. J. Radiat. Oncol. Biol. Phys. 120, e612 (2024).
Google Scholar
Shi, Q. et al. Reworking knowledgeable consent era utilizing massive language fashions: blended strategies research. JMIR Med. Inform. 13, e68139 (2025).
Google Scholar
Rudra, P., Balke, W. -T., Kacprowski, T., Ursin, F. & Salloch, S. Giant language fashions for surgical knowledgeable consent: an moral perspective on simulated empathy. J. Med. Ethics (2025)
Ravfogel, S., Goldberg, Y. & Goldberger, J. Conformal nucleus sampling. In Findings of the Affiliation for Computational Linguistics 27–34 (ACL, 2023).
Minh, N. N. et al. Turning up the warmth: min-p sampling for artistic and coherent LLM outputs. In thirteenth Worldwide Convention on Studying Representations (ICLR, 2025).
Zhou, Okay., Yang, J., Loy, C. C. & Liu, Z. Conditional immediate studying for vision-language fashions. In Proc. IEEE/CVF Convention on Laptop Imaginative and prescient and Sample Recognition 16816–16825 (IEEE, 2022).
Khasentino, J. et al. A private well being massive language mannequin for sleep and health teaching. Nat. Med. 31, 3394–3403 (2025).
Wen, J. et al. The genetic structure of multimodal human mind age. Nat. Commun. 15, 2604 (2024).
Google Scholar
Mizrahi, D. et al. 4M: massively multimodal masked modeling. Adv. Neural Inf. Proc. Syst. 36, 58363–58408 (2023).
Meng, X., Solar, Okay., Xu, J., He, X. & Shen, D. Multi-modal modality-masked diffusion community for mind MRI synthesis with random modality lacking. IEEE Trans. Med. Imaging 43, 2587–2598 (2024).
Stahlschmidt, S. R., Ulfenborg, B. & Synnergren, J. Multimodal deep studying for biomedical information fusion: a assessment. Transient Bioinform. 23, bbab569 (2022).
Boehm, Okay. M., Khosravi, P., Vanguri, R., Gao, J. & Shah, S. P. Harnessing multimodal information integration to advance precision oncology. Nat. Rev. Most cancers 22, 114–126 (2022).
Google Scholar
Johnson, R., Li, M. M., Noori, A., Queen, O. & Zitnik, M. Graph synthetic intelligence in drugs. Annu. Rev. Biomed. Information Sci. 7, 345–368 (2024).
Google Scholar
Kline, A. et al. Multimodal machine studying in precision well being: a scoping assessment. npj Digit. Med. 5, 171 (2022).
Google Scholar
Huang, Y. et al. Multimodal AI predicts scientific outcomes of drug mixtures from preclinical information. Preprint at (2025).
Zhang, Y. et al. A number of heads are higher than one: combination of modality information specialists for entity illustration studying. In thirteenth Worldwide Convention on Studying Representations (ICLR, 2025).
Bao, H. et al. VLMo: unified vision-language pre-training with mixture-of-modality-experts. Adv. Neural Inf. Proc. Syst. 35, 32897–32912 (2022).
Yun, S. et al. Flex-MoE: modeling arbitrary modality mixture by way of the versatile mixture-of-experts. Adv. Neural Inf. Proc. Syst. 37, 98782–98805 (2024).
Cho, M. et al. Cocoon: sturdy multi-modal notion with uncertainty-aware sensor fusion. In thirteenth Worldwide Convention on Studying Representations (ICLR, 2025).
Tu, T. et al. In the direction of conversational diagnostic synthetic intelligence. Nature 642, 442–450 (2025).
McDuff, D. et al. In the direction of correct differential analysis with massive language fashions. Nature 642, 451–457 (2025).
Gao, S. et al. Empowering biomedical discovery with AI brokers. Cell 187, 6125–6151 (2024).
Google Scholar
Guo, D. et al. DeepSeek-R1 incentivizes reasoning in LLMs by way of reinforcement studying. Nature 645, 633–638 (2025).
Gao, S. et al. TxAgent: an AI agent for therapeutic reasoning throughout a universe of instruments. Preprint at (2025).
Qu, X. et al. A survey of environment friendly reasoning for big reasoning fashions: language, multimodality, and past. Preprint at (2025).
Besta, M. et al. Reasoning language fashions: a blueprint. Preprint at (2025).
Johnson, R. et al. ClinVec: unified embeddings of scientific codes allow knowledge-grounded AI in drugs. Preprint at medRxiv (2025).
Wallace, E. et al. Managing sufferers with multimorbidity in main care. BMJ 350, h176 (2015).
Spillmann, R. C. et al. A window into dwelling with an undiagnosed illness: sickness narratives from the Undiagnosed Ailments Community. Orphanet J. Uncommon Dis. 12, 1–11 (2017).
Google Scholar
Wei, J. et al. Chain-of-thought prompting elicits reasoning in massive language fashions. Adv. Neural Inf. Proc. Syst. 35, 24824–24837 (2022).
Rafailov, R. et al. Direct choice optimization: your language mannequin is secretly a reward mannequin. Adv. Neural Inf. Proc. Syst. 36, 53728–53741 (2023).
Nathani, D. et al. MLGym: a brand new framework and benchmark for advancing AI analysis brokers. Preprint at https://doi.org/10.48550/arXiv.2502.14499(2025).
Jiang, Y. et al. MedAgentBench: a digital EHR surroundings to benchmark medical LLM brokers. NEJM AI 2, AIdbp2500144 (2025).
Kazemi, M. et al. BIG-bench further exhausting. In Proc. 63rd Annual Assembly of the Affiliation for Computational Linguistics (Vol. 1: Lengthy Papers) 26473–26501 (ACL, 2025).
Liang, P. et al. Holistic analysis of language fashions. Preprint at (2023).
Choi, H. Okay., Khanov, M., Wei, H. & Li, Y. How contaminated is your benchmark? Measuring dataset leakage in massive language fashions with kernel divergence. In thirteenth Worldwide Convention on Machine Studying (ICLR, 2025).
Ektefaie, Y. et al. Evaluating generalizability of synthetic intelligence fashions for molecular datasets. Nat. Mach. Intell. 6, 1512–1524 (2024).
Google Scholar
Bourlon, M. T. et al. Envisioning tutorial world oncologists: proposed competencies for world oncology coaching from ASCO. JCO Glob. Oncol. 10, e2300157 (2024).
Johnson-Peretz, J. et al. Geographical, social, and political contexts of tuberculosis management and intervention, as reported by mid-level well being managers in Uganda: ‘the exercise round city’. Soc. Sci. Med. 338, 116363 (2023).
Google Scholar
Ning, Y. et al. An ethics evaluation instrument for synthetic intelligence implementation in healthcare: CARE-AI. Nat. Med. 30, 3038–3039 (2024).
Google Scholar
Boverhof, B. -J. et al. Radiology AI Deployment and Evaluation Rubric (RADAR) to deliver value-based AI into radiological apply. Insights Imaging 15, 34 (2024).
Google Scholar
Dagan, N. et al. Analysis of AI options in well being care organizations — the OPTICA instrument. NEJM AI 1, AIcs2300269 (2024).
Borja, N. A. et al. Advancing fairness in uncommon illness analysis: insights from the undiagnosed illness community. Am. J. Med. Genet. A 197, e63904 (2025).
Google Scholar
Williams, J. S., Walker, R. J. & Egede, L. E. Reaching fairness in an evolving healthcare system: alternatives and challenges. Am. J. Med. Sci. 351, 33–43 (2016).
Google Scholar
Pool, J., Indulska, M. & Sadiq, S. Giant language fashions and generative AI in telehealth: a accountable use lens. J. Am. Med. Inform. Assoc. 31, 2125–2136 (2024).
Google Scholar
Yu, Okay. -H., Healey, E., Leong, T. -Y., Kohane, I. S. & Manrai, A. Okay. Medical synthetic intelligence and human values. N. Engl. J. Med. 390, 1895–1904 (2024).
Google Scholar
Lewis, P. et al. Retrieval-augmented era for knowledge-intensive NLP duties. Adv. Neural Inf. Course of. Syst. 33, 9459–9474 (2020).
Wei, J. et al. Finetuned language fashions are zero-shot learners. In tenth Worldwide Convention on Studying Representations (ICLR, 2022).
Ouyang, L. et al. Coaching language fashions to observe directions with human suggestions. Adv. Neural Inf. Course of. Syst. 35, 27730–27744 (2022).
Gururangan, S. et al. Don’t cease pretraining: adapt language fashions to domains and duties. In Proc. 58th Annual Assembly of the Affiliation for Computational Linguistics 8342–8360 (ACL, 2020).
Schick, T. et al. Toolformer: language fashions can train themselves to make use of instruments. Adv. Neural Inf. Course of. Syst. 36, 68539–68551 (2023).



