Scaling Medical AI Throughout Scientific Contexts

Jiang, L. Y. et al. Well being system-scale language fashions are all-purpose prediction engines. Nature 619, 357–362 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Singhal, Okay. et al. Giant language fashions encode scientific information. Nature 620, 172–180 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Chen, R. J. et al. In the direction of a general-purpose basis mannequin for computational pathology. Nat. Med. 30, 850–862 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Habicht, J. et al. Closing the accessibility hole to psychological well being therapy with a customized self-referral chatbot. Nat. Med. 30, 595–602 (2024).

Article
CAS
PubMed

Google Scholar

Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Wan, P. et al. Outpatient reception by way of collaboration between nurses and a big language mannequin: a randomized managed trial. Nat. Med. 30, 2878–2885 (2024).

Article
CAS
PubMed

Google Scholar

Huang, Okay. et al. A basis mannequin for clinician-centered drug repurposing. Nat. Med. 30, 3601–3613 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Li, J. et al. Built-in image-based deep studying and language fashions for main diabetes care. Nat. Med. 30, 2886–2896 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Liu, X. et al. A generalist medical language mannequin for illness analysis help. Nat. Med. 31, 932–942 (2025).

Article
CAS
PubMed

Google Scholar

Van Veen, D. et al. Tailored massive language fashions can outperform medical specialists in scientific textual content summarization. Nat. Med. 30, 1134–1142 (2024).

Article
PubMed
PubMed Central

Google Scholar

Johri, S. et al. An analysis framework for scientific use of enormous language fashions in affected person interplay duties. Nat. Med. 31, 77–86 (2025).

Article
CAS
PubMed

Google Scholar

Ao, G. et al. Comparative evaluation of enormous language fashions on uncommon illness identification. Orphanet J. Uncommon Dis. 20, 150 (2025).

Article
PubMed
PubMed Central

Google Scholar

Shyr, C. Giant language fashions for uncommon illness analysis on the undiagnosed ailments community. JAMA Netw. Open 8, e2528538 (2025).

Article
PubMed
PubMed Central

Google Scholar

Weiner, S. J. & Schwartz, A. Listening for What Issues: Avoiding Contextual Errors in Well being Care (Oxford Univ. Press, 2023).

Yu, Okay. -H. & Kohane, I. S. Framing the challenges of synthetic intelligence in drugs. BMJ Qual. Saf. 28, 238–241 (2019).

Article
PubMed

Google Scholar

Zhang, S., Liu, Q., Qin, G., Naumann, T. & Poon, H. Med-RLVR: rising medical reasoning from a 3B base mannequin by way of reinforcement studying. Preprint at (2025).

Hager, P. et al. Analysis and mitigation of the constraints of enormous language fashions in scientific decision-making. Nat. Med. 30, 2613–2622 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

McDermott, M. B. A., Yap, B., Szolovits, P. & Zitnik, M. Construction-inducing pre-training. Nat. Mach. Intell. 5, 612–621 (2023).

Article

Google Scholar

Guo, L. L. et al. A multi-center research on the adaptability of a shared basis mannequin for digital well being data. npj Digit. Med. 7, 171 (2024).

Article
PubMed
PubMed Central

Google Scholar

Wornow, M. et al. The shaky foundations of enormous language fashions and basis fashions for digital well being data. npj Digit. Med. 6, 135 (2023).

Pais, C. et al. Giant language fashions for stopping treatment route errors in on-line pharmacies. Nat. Med. 30, 1574–1582 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Sabuncu, M. R., Wang, A. Q. & Nguyen, M. Moral use of synthetic intelligence in medical diagnostics calls for a deal with accuracy, not equity. NEJM AI 2, AIp2400672 (2024).

Li, M. M. et al. Contextual AI fashions for single-cell protein biology. Nat. Strategies 21, 1546–1557 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Kather, J. N., Ferber, D., Wiest, I. C., Gilbert, S. & Truhn, D. Giant language fashions might make pure language once more the common interface of healthcare. Nat. Med. 30, 2708–2710 (2024).

Article
CAS
PubMed

Google Scholar

Brown, T. et al. Language fashions are few-shot learners. Adv. Neural Inf. Proc. Syst. 33, 1877–1901 (2020).

Google Scholar

Liu, H. et al. Few-shot parameter-efficient fine-tuning is healthier and cheaper than in-context studying. Adv. Neural Inf. Proc. Syst. 35, 1950–1965 (2022).

Pan, J., Gao, T., Chen, H. & Chen, D. What in-context studying ‘learns’ in-context: disentangling activity recognition and activity studying. In Findings of the Affiliation for Computational Linguistics 8298–8319 (ACL, 2023).

Min, S. et al. Rethinking the position of demonstrations: what makes in-context studying work? In Proc. 2022 Convention on Empirical Strategies in Pure Language Processing 11048–11064 (ACL, 2022).

Chen, B., Zhang, Z., Langrené, N. & Zhu, S. Unleashing the potential of immediate engineering for big language fashions. Patterns 6, 101260 (2025).

Article
PubMed
PubMed Central

Google Scholar

Shen, S. et al. Multitask vision-language immediate tuning. In Proc. the IEEE/CVF Winter Convention on Purposes of Laptop Imaginative and prescient 5656–5667 (IEEE, 2024).

Wang, W. et al. VisionLLM: massive language mannequin can be an open-ended decoder for vision-centric duties. Adv. Neural Inf. Proc. Syst. 36, 61501–61513 (2023).

Tanwani, A. Okay., Barral, J. & Freedman, D. RepsNet: combining imaginative and prescient with language for automated medical stories. In Worldwide Convention on Medical Picture Computing and Laptop-assisted Intervention 714–724 (Springer, 2022).

Shentu, J. & Al Moubayed, N. CXR-IRGen: an built-in imaginative and prescient and language mannequin for the era of clinically correct chest X-ray image-report pairs. In Proc. IEEE/CVF Winter Convention on Purposes of Laptop Imaginative and prescient (IEEE, 2024).

Wu, S. et al. CollabLLM: from passive responders to energetic collaborators. In Proc. forty second Worldwide Convention on Machine Studying (PMLR, 2025).

Alsentzer, E. et al. Few shot studying for phenotype-driven analysis of sufferers with uncommon genetic ailments. npj Digit. Med. 8, 380 (2025).

Goh, E. et al. Giant language mannequin affect on diagnostic reasoning: a randomized scientific trial. JAMA Netw. Open 7, e2440969 (2024).

Article
PubMed
PubMed Central

Google Scholar

Wang, L. et al. Immediate engineering in consistency and reliability with the evidence-based guideline for LLMs. npj Digit. Med. 7, 41 (2024).

Khattab, O. et al. DSPy: compiling declarative language mannequin calls into state-of-the-art pipelines. In Worldwide Convention on Studying Representations (ICLR, 2024).

Yuksekgonul, M. et al. Optimizing generative AI by backpropagating language mannequin suggestions. Nature 639, 609–616 (2025).

Article
CAS
PubMed

Google Scholar

Vaziri, M., Mandel, L., Spiess, C. & Hirzel, M. PDL: a declarative immediate programming language. Preprint at (2024).

Lu, Y. et al. In the direction of doctor-like reasoning: Medical RAG fusing information with affected person analogy by way of textual gradients. In thirty ninth Convention on Neural Data Processing Techniques (NeurIPS, 2025).

Maharjan, J. et al. OpenMedLM: immediate engineering can out-perform fine-tuning in medical question-answering with open-source massive language fashions. Sci. Rep. 14, 14156 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Nori, H. et al. Can generalist basis fashions outcompete special-purpose tuning? Case research in drugs. Preprint at (2023).

Wu, S., Koo, M., Scalzo, F. & Kurtz, I. AutoMedPrompt: a brand new framework for optimizing LLM medical prompts utilizing textual gradients. Preprint at (2025).

Yu, F. et al. Heterogeneity and predictors of the consequences of AI help on radiologists. Nat. Med. 30, 837–849 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Rrv, A., Tyagi, N., Uddin, M. N., Varshney, N. & Baral, C. Chaos with key phrases: exposing massive language fashions sycophancy to deceptive key phrases and evaluating protection methods. In Findings of the Affiliation for Computational Linguistics 12717–12733 (ACL, 2024).

Fanous, A. et al. SycEval: evaluating LLM sycophancy. In Proc. AAAI/ACM Convention on AI, Ethics, and Society 8, 893–900 (ACM, 2025).

Su, X. et al. KGARevion: an AI agent for knowledge-intensive biomedical QA. In Worldwide Convention on Studying Representations (ICLR, 2025).

Zhang, G. et al. Leveraging lengthy context in retrieval augmented language fashions for medical query answering. npj Digit. Med. 8, 239 (2025).

Ke, Y. H. et al. Retrieval augmented era for 10 massive language fashions and its generalizability in assessing medical health. npj Digit. Med. 8, 187 (2025).

Kresevic, S. et al. Optimization of hepatological scientific pointers interpretation by massive language fashions: a retrieval augmented generation-based framework. npj Digit. Med. 7, 102 (2024).

Lopez, I. et al. Medical entity augmented retrieval for scientific data extraction. npj Digit. Med. 8, 45 (2025).

Asai, A., Wu, Z., Wang, Y., Sil, A. & Hajishirzi, H. Self-RAG: studying to retrieve, generate, and critique by way of self-reflection. In Worldwide Convention on Studying Representations (ICLR, 2024).

Yang, D., Zeng, L., Rao, J. & Zhang, Y. Realizing you don’t know: studying when to proceed search in multi-round RAG by way of self-practicing. In Proc. forty eighth Worldwide ACM SIGIR Convention on Analysis and Growth in Data Retrieval 1305–1315 (ACM, 2025).

Islam, S. B. et al. Open-RAG: enhanced retrieval augmented reasoning with open-source massive language fashions. In Findings of the Affiliation for Computational Linguistics 14231–14244 (ACL, 2024).

Jeong, S., Baek, J., Cho, S., Hwang, S. J. & Park, J. C. Adaptive-RAG: studying to adapt retrieval-augmented massive language fashions by way of query complexity. In Proc. 2024 Convention of the North American Chapter of the Affiliation for Computational Linguistics: Human Language Applied sciences (Vol. 1: Lengthy Papers) 7036–7050 (ACL, 2024).

Yang, R. et al. Retrieval-augmented era for generative synthetic intelligence in well being care. npj Well being Syst. 2, 2 (2025).

Anisuzzaman, D. M., Malins, J. G., Friedman, P. A. & Attia, Z. I. High quality-tuning massive language fashions for specialised use instances. Mayo Clin. Proc. Digit. Well being 3, 100184 (2025).

Article
CAS
PubMed

Google Scholar

Wiest, I. C. et al. Deidentifying medical paperwork with native, privacy-preserving massive language fashions: the LLM-anonymizer. NEJM AI 2, 4 (2025).

Croskerry, P. A common mannequin of diagnostic reasoning. Acad. Med. 84, 1022–1028 (2009).

Geiping, J. et al. Scaling up test-time compute with latent reasoning: a recurrent depth method. In thirty ninth Annual Convention on Neural Data Processing Techniques (NeurIPS, 2025).

Makarov, N. et al. Giant language fashions forecast affected person well being trajectories enabling digital twins. npj Digit. Med. 8, 588 (2025).

Renc, P. et al. Zero shot well being trajectory prediction utilizing transformer. npj Digit. Med. 7, 256 (2024).

Wang, J. et al. Self-improving generative basis mannequin for artificial medical picture era and scientific purposes. Nat. Med. 31, 609–617 (2024).

Article
PubMed

Google Scholar

Rao, V. M. et al. Multimodal generative AI for medical picture interpretation. Nature 639, 888–896 (2025).

Article
CAS
PubMed

Google Scholar

Duan, Y., Xu, C., Pei, J., Han, J. & Li, C. Pre-train and plug-in: versatile conditional textual content era with variational auto-encoders. In Proc. 58th Annual Assembly of the Affiliation for Computational Linguistics 253–262 (ACL, 2020).

Epstein, D., Jabri, A., Poole, B., Efros, A. & Holynski, A. Diffusion self-guidance for controllable picture era. Adv. Neural Inf. Proc. Syst. 36, 16222–16239 (2023).

Li, Z. et al. ControlAR: controllable picture era with autoregressive fashions. In thirteenth Worldwide Convention on Studying Representations (ICLR, 2025).

Beattie, J. et al. Utilizing massive language fashions to create affected person centered consent kinds. Int. J. Radiat. Oncol. Biol. Phys. 120, e612 (2024).

Article

Google Scholar

Shi, Q. et al. Reworking knowledgeable consent era utilizing massive language fashions: blended strategies research. JMIR Med. Inform. 13, e68139 (2025).

Article
PubMed
PubMed Central

Google Scholar

Rudra, P., Balke, W. -T., Kacprowski, T., Ursin, F. & Salloch, S. Giant language fashions for surgical knowledgeable consent: an moral perspective on simulated empathy. J. Med. Ethics (2025)

Ravfogel, S., Goldberg, Y. & Goldberger, J. Conformal nucleus sampling. In Findings of the Affiliation for Computational Linguistics 27–34 (ACL, 2023).

Minh, N. N. et al. Turning up the warmth: min-p sampling for artistic and coherent LLM outputs. In thirteenth Worldwide Convention on Studying Representations (ICLR, 2025).

Zhou, Okay., Yang, J., Loy, C. C. & Liu, Z. Conditional immediate studying for vision-language fashions. In Proc. IEEE/CVF Convention on Laptop Imaginative and prescient and Sample Recognition 16816–16825 (IEEE, 2022).

Khasentino, J. et al. A private well being massive language mannequin for sleep and health teaching. Nat. Med. 31, 3394–3403 (2025).

Wen, J. et al. The genetic structure of multimodal human mind age. Nat. Commun. 15, 2604 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Mizrahi, D. et al. 4M: massively multimodal masked modeling. Adv. Neural Inf. Proc. Syst. 36, 58363–58408 (2023).

Meng, X., Solar, Okay., Xu, J., He, X. & Shen, D. Multi-modal modality-masked diffusion community for mind MRI synthesis with random modality lacking. IEEE Trans. Med. Imaging 43, 2587–2598 (2024).

Stahlschmidt, S. R., Ulfenborg, B. & Synnergren, J. Multimodal deep studying for biomedical information fusion: a assessment. Transient Bioinform. 23, bbab569 (2022).

Boehm, Okay. M., Khosravi, P., Vanguri, R., Gao, J. & Shah, S. P. Harnessing multimodal information integration to advance precision oncology. Nat. Rev. Most cancers 22, 114–126 (2022).

Article
CAS
PubMed

Google Scholar

Johnson, R., Li, M. M., Noori, A., Queen, O. & Zitnik, M. Graph synthetic intelligence in drugs. Annu. Rev. Biomed. Information Sci. 7, 345–368 (2024).

Article
PubMed
PubMed Central

Google Scholar

Kline, A. et al. Multimodal machine studying in precision well being: a scoping assessment. npj Digit. Med. 5, 171 (2022).

Article
PubMed
PubMed Central

Google Scholar

Huang, Y. et al. Multimodal AI predicts scientific outcomes of drug mixtures from preclinical information. Preprint at (2025).

Zhang, Y. et al. A number of heads are higher than one: combination of modality information specialists for entity illustration studying. In thirteenth Worldwide Convention on Studying Representations (ICLR, 2025).

Bao, H. et al. VLMo: unified vision-language pre-training with mixture-of-modality-experts. Adv. Neural Inf. Proc. Syst. 35, 32897–32912 (2022).

Yun, S. et al. Flex-MoE: modeling arbitrary modality mixture by way of the versatile mixture-of-experts. Adv. Neural Inf. Proc. Syst. 37, 98782–98805 (2024).

Cho, M. et al. Cocoon: sturdy multi-modal notion with uncertainty-aware sensor fusion. In thirteenth Worldwide Convention on Studying Representations (ICLR, 2025).

Tu, T. et al. In the direction of conversational diagnostic synthetic intelligence. Nature 642, 442–450 (2025).

McDuff, D. et al. In the direction of correct differential analysis with massive language fashions. Nature 642, 451–457 (2025).

Gao, S. et al. Empowering biomedical discovery with AI brokers. Cell 187, 6125–6151 (2024).

Article
CAS
PubMed

Google Scholar

Guo, D. et al. DeepSeek-R1 incentivizes reasoning in LLMs by way of reinforcement studying. Nature 645, 633–638 (2025).

Gao, S. et al. TxAgent: an AI agent for therapeutic reasoning throughout a universe of instruments. Preprint at (2025).

Qu, X. et al. A survey of environment friendly reasoning for big reasoning fashions: language, multimodality, and past. Preprint at (2025).

Besta, M. et al. Reasoning language fashions: a blueprint. Preprint at (2025).

Johnson, R. et al. ClinVec: unified embeddings of scientific codes allow knowledge-grounded AI in drugs. Preprint at medRxiv (2025).

Wallace, E. et al. Managing sufferers with multimorbidity in main care. BMJ 350, h176 (2015).

Spillmann, R. C. et al. A window into dwelling with an undiagnosed illness: sickness narratives from the Undiagnosed Ailments Community. Orphanet J. Uncommon Dis. 12, 1–11 (2017).

Article

Google Scholar

Wei, J. et al. Chain-of-thought prompting elicits reasoning in massive language fashions. Adv. Neural Inf. Proc. Syst. 35, 24824–24837 (2022).

Rafailov, R. et al. Direct choice optimization: your language mannequin is secretly a reward mannequin. Adv. Neural Inf. Proc. Syst. 36, 53728–53741 (2023).

Nathani, D. et al. MLGym: a brand new framework and benchmark for advancing AI analysis brokers. Preprint at https://doi.org/10.48550/arXiv.2502.14499(2025).

Jiang, Y. et al. MedAgentBench: a digital EHR surroundings to benchmark medical LLM brokers. NEJM AI 2, AIdbp2500144 (2025).

Kazemi, M. et al. BIG-bench further exhausting. In Proc. 63rd Annual Assembly of the Affiliation for Computational Linguistics (Vol. 1: Lengthy Papers) 26473–26501 (ACL, 2025).

Liang, P. et al. Holistic analysis of language fashions. Preprint at (2023).

Choi, H. Okay., Khanov, M., Wei, H. & Li, Y. How contaminated is your benchmark? Measuring dataset leakage in massive language fashions with kernel divergence. In thirteenth Worldwide Convention on Machine Studying (ICLR, 2025).

Ektefaie, Y. et al. Evaluating generalizability of synthetic intelligence fashions for molecular datasets. Nat. Mach. Intell. 6, 1512–1524 (2024).

Article

Google Scholar

Bourlon, M. T. et al. Envisioning tutorial world oncologists: proposed competencies for world oncology coaching from ASCO. JCO Glob. Oncol. 10, e2300157 (2024).

Johnson-Peretz, J. et al. Geographical, social, and political contexts of tuberculosis management and intervention, as reported by mid-level well being managers in Uganda: ‘the exercise round city’. Soc. Sci. Med. 338, 116363 (2023).

Article
PubMed
PubMed Central

Google Scholar

Ning, Y. et al. An ethics evaluation instrument for synthetic intelligence implementation in healthcare: CARE-AI. Nat. Med. 30, 3038–3039 (2024).

Article
CAS
PubMed

Google Scholar

Boverhof, B. -J. et al. Radiology AI Deployment and Evaluation Rubric (RADAR) to deliver value-based AI into radiological apply. Insights Imaging 15, 34 (2024).

Article
PubMed
PubMed Central

Google Scholar

Dagan, N. et al. Analysis of AI options in well being care organizations — the OPTICA instrument. NEJM AI 1, AIcs2300269 (2024).

Borja, N. A. et al. Advancing fairness in uncommon illness analysis: insights from the undiagnosed illness community. Am. J. Med. Genet. A 197, e63904 (2025).

Article
CAS
PubMed

Google Scholar

Williams, J. S., Walker, R. J. & Egede, L. E. Reaching fairness in an evolving healthcare system: alternatives and challenges. Am. J. Med. Sci. 351, 33–43 (2016).

Article
PubMed
PubMed Central

Google Scholar

Pool, J., Indulska, M. & Sadiq, S. Giant language fashions and generative AI in telehealth: a accountable use lens. J. Am. Med. Inform. Assoc. 31, 2125–2136 (2024).

Article
PubMed
PubMed Central

Google Scholar

Yu, Okay. -H., Healey, E., Leong, T. -Y., Kohane, I. S. & Manrai, A. Okay. Medical synthetic intelligence and human values. N. Engl. J. Med. 390, 1895–1904 (2024).

Article
PubMed
PubMed Central

Google Scholar

Lewis, P. et al. Retrieval-augmented era for knowledge-intensive NLP duties. Adv. Neural Inf. Course of. Syst. 33, 9459–9474 (2020).

Google Scholar

Wei, J. et al. Finetuned language fashions are zero-shot learners. In tenth Worldwide Convention on Studying Representations (ICLR, 2022).

Ouyang, L. et al. Coaching language fashions to observe directions with human suggestions. Adv. Neural Inf. Course of. Syst. 35, 27730–27744 (2022).

Gururangan, S. et al. Don’t cease pretraining: adapt language fashions to domains and duties. In Proc. 58th Annual Assembly of the Affiliation for Computational Linguistics 8342–8360 (ACL, 2020).

Schick, T. et al. Toolformer: language fashions can train themselves to make use of instruments. Adv. Neural Inf. Course of. Syst. 36, 68539–68551 (2023).

Google Scholar

Top Posts

Exposing Spin apps on SpinKube with GatewayAPI

The wearable ecosystem trusted by Olympic athletes

Why final 12 months’s LG C5 OLED is the neatest TV purchase proper now – particularly at 50% off

Scaling medical AI throughout scientific contexts

High 7 OpenClaw Instruments & Integrations You Are Lacking Out On

Nous Analysis Releases ‘Hermes Agent’ to Repair AI Forgetfulness with Multi-Stage Reminiscence and Devoted Distant Terminal Entry Help

Scaling Function Engineering Pipelines with Feast and Ray

Partially shared multi-modal embedding learns holistic illustration of cell state

A Coding Implementation to Simulate Sensible Byzantine Fault Tolerance with Asyncio, Malicious Nodes, and Latency Evaluation

Optimizing Token Era in PyTorch Decoder Fashions

Exposing Spin apps on SpinKube with GatewayAPI

The wearable ecosystem trusted by Olympic athletes

Why final 12 months’s LG C5 OLED is the neatest TV purchase proper now – particularly at 50% off

BTC’s worth bounce fails to persuade choices merchants: Crypto Daybook Americas

Microsoft Warns Builders of Faux Subsequent.js Job Repos Delivering In-Reminiscence Malware

High 7 OpenClaw Instruments & Integrations You Are Lacking Out On

Good authorities group questions particulars of proposed SES reforms

Aeris, Verizon Enterprise Streamline International IoT Connectivity

Trending

Exposing Spin apps on SpinKube with GatewayAPI

The wearable ecosystem trusted by Olympic athletes

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Scaling medical AI throughout scientific contexts

Related Posts