Participants in the challenge tested and debugged robots working on different tasks. | Source: AGIBOT
AGIBOT Innovation Technology Co. recently hosted the AGIBOT World Challenge 2026 alongside ICRA 2026 in Vienna. The event gathered 526 research and enterprise teams from 27 countries to compete in two embodied AI categories: “Reasoning to Action” and “World Model.”
Shanghai-based AGIBOT noted that the competition underscored a significant shift in how embodied AI is assessed. The company explained that the industry is moving past simulation-based scores toward closed-loop testing on actual robots, real-world tasks, and standardized benchmarks.
The competition used a benchmark-driven approach that merged online automated evaluation with an offline real-robot final in Vienna. Leveraging AGIBOT’s EWMBench and Genie Sim Benchmark, the unified framework allowed for automated testing, standardized metrics, and reproducible outcomes.
During the offline final, finalist teams performed tasks using the AGIBOT G2 humanoid robot. By integrating real-robot validation into the evaluation process, the competition emphasized robot stability, real-world adaptability, and long-horizon task reliability in its scoring system. The company, also known as Zhiyuan Robotics Co., stated that this approach more closely aligns technical evaluation with real-world deployment requirements.
The challenge attracted research and industry teams from prominent institutions and companies, including the Chinese Academy of Sciences, Tsinghua University, the University of Science and Technology of China, the University of California San Diego, Russia’s Sber Robotics Center, Alibaba, Amap, and vivo. Over 100 teams exceeded the official baseline.
What distinguishes the R2A and WM tracks?
The two tracks at the AGIBOT World Challenge 2026 mirrored the broader progression of embodied AI from task execution toward understanding, prediction, and decision-making, according to AGIBOT.
The Reasoning to Action (R2A) track assessed how robots interpret tasks, plan actions, and carry them out in physical environments. The R2A track, an evolution of the 2025 Manipulation track, broadened the evaluation from action execution to the complete process of environment understanding, task planning, and physical execution.
The World Model (WM) track concentrated on how AI systems forecast physical-world changes and model interactions based on robot actions and sensor inputs.
Teams developed reasoning-and-manipulation models using the AGIBOT WORLD open-source dataset and assessed them through Genie Sim 3.0, with the benchmark encompassing language understanding, spatial reasoning, atomic skills, disturbance adaptation, and zero-shot transfer.
In the final standings, PrismBot from vivo claimed the championship with 43.47 points, followed by Shanghai RoboParty’s RP-VLA with 35.66 points and Russia’s GreenVLA with 33.19 points.
AGIBOT focuses on supermarket tasks with the challenge
Alongside the competition, AGIBOT and Dexmal introduced a supermarket benchmark track centered on end-to-end decision-making and whole-body control. This track included non-ideal physical interactions, such as object drops and grasping failures, to better represent the complexity of real-world interaction and offer a more practical evaluation framework for world model research.
Set in a realistic retail environment, the track required models to complete the full mobile manipulation process, from autonomous navigation and item picking to item transport and placement, under physical constraints like shelf height limits and randomized item placement. Through API-based remote control, participants’ algorithms directly operated real robots, establishing a practical benchmark for assessing embodied intelligence in deployment-focused scenarios.
In the World Model (WM) track, NeoVerse-ABot, a joint team from the Institute of Automation of the Chinese Academy of Sciences and Amap CV Lab, secured first place. The PAI@IAII team from the Institute of Industrial Artificial Intelligence at the Chinese Academy of Sciences ranked second. The Loop team from the University of Science and Technology of China placed third.

With the World Challenge, AGIBOT hoped to contribute to a more practical and reproducible evaluation framework for embodied AI. | Source: AGIBOT
AGIBOT unveils full-stack toolchain for robot validation
Beyond the competition itself, AGIBOT released a full-stack toolchain covering real-world data, simulation evaluation, and real-robot testing. The toolchain included the AGIBOT WORLD open-source dataset, Genie Sim 3.0, and the AGIBOT G2 robot platform, assisting developers in validating models across the journey from training to simulation and physical deployment.
EWMBench and Genie Sim Benchmark provided standardized metrics, automated evaluation, and comparable results across simulation and physical testing. They tackled common challenges such as inconsistent evaluation criteria and the disparity between simulated performance and real-world deployment.
AGIBOT stated that it will merge the technical and ecosystem resources developed through the competition with its ongoing benchmark development and open-source initiatives. The company also plans to launch an online simulation leaderboard, introduce additional test tasks and diversified benchmarks, and support more comprehensive quantitative evaluation of model capabilities.
Furthermore, AGIBOT stated it will continue to refine its benchmarks and full-stack toolchain, collaborating with global research institutions, developers, and industry partners. Its stated objective is to help embodied AI transition from individual algorithmic breakthroughs toward systems that can be deployed and scaled in real-world environments.
In other benchmark news, Fraunhofer IPA recently introduced a new test benchmark for humanoid robots, and NIST proposed its own baseline performance benchmark for humanoids.
Submit your session idea for the 2026 RoboBusiness


