Perception-Driven World Models For Embodied AI

[Initially submitted on 1 Jun 2026 (v1), with the final update on 16 Jun 2026 (version 3)]

Authors:NVIDIA: Aditi, Niket Agarwal, Arslan Ali, Jon Allen, Martin Antolini, Adeline Aubame, Alisson Azzolini, Junjie Bai, Maciej Bala, Yogesh Balaji, Josh Bapst, Aarti Basant, Mukesh Beladiya, Mohammad Qazim Bhat, Zaid Pervaiz Bhat, Dan Blick, Vanni Brighella, Han Cai, Tiffany Cai, Eric Cameracci, Jiaxin Cao, Yulong Cao, Mark Carlson, Carlos Casanova, Ting-Yun Chang, Yan Chang, Yu-Wei Chao, Prithvijit Chattopadhyay, Roshan Chaudhari, Chieh-Yun Chen, Junyu Chen, Ke Chen, Qizhi Chen, Wenkai Chen, Xiaotong Chen, Yu Chen, An-Chieh Cheng, Click Cheng, Xiu Chia, Jeana Choi, Chaeyeon Chung, Wenyan Cong, Yin Cui, Magdalena Dadela, Nalin Dadhich, Wenliang Dai, Joyjit Daw, Alperen Degirmenci, Rodrigo Vieira Del Monte, Robert Denomme, Sameer Dharur, Marco Di Lucca, Ke Ding, Wenhao Ding, Yifan Ding, Yuzhu Dong, Nicole Drumheller, Yilun Du, Aigul Dzhumamuratova, Aleksandr Efitorov, Hamid Eghbalzadeh, Naomi Eigbe, Imad El Hanafi, Hassan Eslami, Benedikt Falk, Jiaojiao Fan, Jim Fan, Amol Fasale, Sergiy Fefilatyev, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Vikram Fugro, Prashant Gaikwad, TJ Galda, Katelyn Gao, Yihuai Gao, Wenhang Ge, Sreyan Ghosh, Arushi Goel, Vivek Goel, Akash Gokul, Rama Govindaraju, Jinwei Gu, Miguel Guerrero, Elfie Guo, Aryaman Gupta, Siddharth Gururani, Hugo Hadfield, Song Han, Ankur Handa, Zekun Hao, Mohammad Harrim, Ali Hassani, Nathan Hayes-Roth, Yufan He, Chris Helvig, Cyrus Hogg

, Madison Huang, Michael Huang, Sophia Huang, Yufan Huang, Jacob Huffman, DeLesley Hutchins, Suneel Indupuru, Boris Ivanovic, Arihant Jain, Joel Jang, Ryan Ji, Yanan Jian, Dongfu Jiang, Jingyi Jin, Atharva Joshi, Nikhilesh Joshi, Pranjali Joshi, Andy Ju, Jaehun Jung, Weiwei Kang, Scott Kassekert, Jan Kautz, Ashna Khetan, Julia Kiczka, Slawek Kierat, Gwanghyun Kim, Kuno Kim, Sunny Kim, Kezhi Kong, Xin Kong, Zhifeng Kong, Tomasz Kornuta, Egor Krivov, Hui Kuang, Saurav Kumar, Chia-Wen Kuo, George Kurian, Wojciech Kutak, JF Lafleche, Himangshu Lahkar, Omar Laymoun, Jayjun Lee, Sanggil Lee, Gabriele Leone, Boyi Li, Freya Li, Jiajun Li, Jinfeng Li, Ling Li, Pengcheng Li, Shangru Li, Tingle Li, Xiaolong Li, Xuan Li, Zhaoshuo Li, Zhiqi Li, Hao Liang, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Ming-Yu Liu, Sifei Liu, Zihan Liu, Hai Loc Lu, Xiangyu Lu, Alice Luo, Ruipu Luo, Wenjie Luo, Jiangran Lyu, Martin Ding Ma, Nic Ma, Qianli Ma, Dawid Majchrowski, Louis Marcoux, Miguel Martin, Qing Miao, Ashkan Mirzaei, Shreyas Misra, Kaichun Mo, Durra Mohsin, Hyejin Moon, Pawel Morkisz, Saeid Motiian, Kirill Motkov, Seungjun Nah, Yashraj Narang, Deepak Narayanan, Thabang Ngazimbi, Julian Ouyang, Shubham Pachori, David Page, Yatian Pang, Sehwi Park, Mahesh Patekar, Mostofa Patwary, Marco Pavone, Trung Pham, Wei Ping, Soha Pouya, Shrimai Prabhumoye, Varun Praveen, Delin Qu, Hesam Rabeti, Morteza Ramezanali, Marilyn Reeb, Xuanchi Ren, Kristen Rumley, Wojciech Rymer, Jun Saito, Yeongho Seol, John Shao, Piyush Shekdar, Tianwei Shen, Humphrey Shi, Min Shi, Stella Shi, Kevin Shih, Mohammad Shoeybi, Mateusz Sieniawski, Shuran Song, Alexander Sotelo, Amir Sotoodeh, Sunil Srinivasa, Vignesh Srinivasakumar, Bartosz Stefaniak, Rahul Heinrich Steiger, Shangkun Sun, Jiaxiang Tang, Shitao Tang, Yangyang Tang, Yue Tang, Tolou Tavakkoli, Kayley Ting, Krzysztof Tomala, Wei-Cheng Tseng, Jibin Varghese, Sergei Vasilev, Thomas Volk, Raju Wagwani, Roger Waleffe, Andrew Z. Wang, Boxiang Wang, Haoxiang Wang, Qiao Wang, Shihao Wang, Shijie Wang, Ting-Chun Wang, Yan Wang, Yu Wang, Rohit Watve, David Wehr, Fangyin Wei, Xinshuo Weng, Jay Zhangjie Wu, Kedi Wu, Hongchi Xia, Summer Xiao, Tianjun Xiao, Kevin Xie, Daguang Xu, Jiashu Xu, Mengyao Xu, Ruqing Xu, Xingqian Xu, Yao Xu, Dinghao Yang, Dong Yang, Hans Yang, Xiaodong Yang, Xuning Yang, Yichu Yang, Yurong You, Zhiding Yu, Hao Yuan, Simon Yuen, Xiaohui Zeng, Pengcuo Zeren, Cindy Zha, Haotian Zhang, Jenny Zhang, Jing Zhang, Liangkai Zhang, Paris Zhang, Shun Zhang, Xuanmeng Zhang, Zhizheng Zhang, Ann Zhao, Yilin Zhao, Yuliya Zhautouskaya, Charles Zhou, Fengzhe Zhou, Shilin Zhu, Yuke Zhu, Dima Zhylko, Artur Zolkowski

et al. (195 additional authors not shown)

Access the PDF for the paper titled Cosmos 3: Omnimodal World Models for Physical AI, created by NVIDIA: Aditi along with 293 co-authors

View PDF

Abstract:We present Cosmos 3, a collection of omnimodal world models built to handle and create language, image, video, audio, and action sequences all within one unified mixture-of-transformers framework. Thanks to its highly flexible input-output setup, Cosmos 3 brings together key modalities for Physical AI, essentially combining vision-language models, video generators, world simulators, and world-action models into a single system. Our results show that Cosmos 3 sets a new state-of-the-art across a wide range of understanding and generation tasks, proving that omnimodal world models serve as scalable, general-purpose backbones for embodied agents. At the time this technical report was released, our post-trained Cosmos 3 models were rated as the top open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the leading policy model by RoboArena. To support open research and faster adoption in Physical AI, we are sharing our code, model checkpoints, curated synthetic datasets, and evaluation benchmarks under the Linux Foundation’s OpenMDW-1.1 License at this https URL and this https URL. You can visit the project website at this https URL.

Submission history

Submitted by: Yin Cui [view email]
[v1]
Mon, 1 Jun 2026 19:12:30 UTC (30,203 KB)
[v2]
Fri, 5 Jun 2026 16:34:56 UTC (30,203 KB)
[v3]
Tue, 16 Jun 2026 23:18:34 UTC (30,173 KB)

Top Posts

Perception-Driven World Models for Embodied AI

The Biggest Reveals from the 2026 AWS Summit in New York

World Cup 2026: The Ultimate Playbook for Surviving Peak Demand

Perception-Driven World Models for Embodied AI

OpenAI Unveils LifeSciBench: A 750-Task Benchmark That Grades AI Models on Real Life-Science Research Using Expert-Crafted Rubrics

Churn Thresholds: The Hidden Lever in Your Pricing Strategy

Mathematicians Draft Playbook for Responsible AI Use — Other Disciplines Should Take Note

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget

Unlock the Power of Local AI: Run Your Own LLM on a Mac Mini with OpenClaw

Beyond the Loop: 7 Blazing-Fast Pandas Techniques That Replace Iteration

Perception-Driven World Models for Embodied AI

The Biggest Reveals from the 2026 AWS Summit in New York

World Cup 2026: The Ultimate Playbook for Surviving Peak Demand

Revolutionizing Council Planning: How OWL’s Generative AI on Google Cloud Automates Local Government Operations

Shocking Bitget Report: Over Half of Aspiring Web3 Professionals Struggle to Secure Their First Job

Navigating the Future: Key Forces and Emerging Paradigms in the Cybersecurity Landscape

Harnessing the Power of Agent Frameworks on Cloudflare: Introducing Flue and Beyond

A New Era in Air Freight Oversight: How Griffin’s Revolutionary Tracking Technology Eliminates Blind Spots in Global Air Cargo Monitoring

Trending

Perception-Driven World Models for Embodied AI

The Biggest Reveals from the 2026 AWS Summit in New York

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Perception-Driven World Models for Embodied AI

Submission history

Related Posts