Recently, a small four-legged robot pivoted sharply to the right on a workbench. Its mirrored left turn, however, was sluggish and lost traction. Because the legs ended up in different servo zones and placed uneven loads on the body, the identical command produced two distinct outcomes. The programming was symmetrical, but the physical interactions were not.
The Llama comparison holds up only until the model needs to control physical hardware. The initial Llama research provided software developers with a ready-made foundation. A team that hadn’t funded the original training could modify the model, reduce its size, and deploy it through standard software pipelines. The model weights were valuable because other teams already possessed the infrastructure to transform them into functioning applications.
Robotic models follow a similar pattern, but a robot policy doesn’t operate independently. A local control system translates the policy’s output into movement on the physical robot through its controller, all within the workstation’s safety boundaries. Broader model access will enable robots to tackle more complex tasks. The real competitive edge will come from converting that capability into reliable operations on existing equipment, complete with diagnostic records that technicians can reference months down the line.
Robot policies are becoming more accessible
Google DeepMind’s Open X-Embodiment initiative aggregated robotic data across multiple organizations and hardware platforms. Their RT-X findings demonstrated that training across diverse robot designs enhances adaptability in certain scenarios, rather than requiring each system to learn exclusively from its own limited data.
DeepMind’s latest offerings divide responsibilities across the robotic architecture. Gemini Robotics 1.5 is a vision-language-action system that processes visual input and verbal instructions to generate movement commands. Gemini Robotics-ER 1.6 operates at a higher level, managing spatial analysis and task sequencing while enabling progress monitoring and tool integration.
NVIDIA has pursued a similar distribution strategy, with GR00T releases and Isaac models reaching developer platforms like Hugging Face’s LeRobot. From a deployment standpoint, the Llama narrative aligns with the trend that sophisticated robot policies are becoming increasingly available to developers.
Compared to Crunchbase’s tally of nearly $14 billion in robotics venture investment in 2025, the individual funding rounds accumulate rapidly. Skild AI secured $1.4 billion for a universal robotics model, while Physical Intelligence is reportedly negotiating an additional $1 billion at a valuation exceeding $11 billion. Yann LeCun’s Advanced Machine Intelligence raised $1.03 billion pursuing an alternative world modeling strategy, and Wayve completed a $1.2 billion Series D for self-driving technology. These investments presume that robot intelligence will become reusable before the industry has validated that the deployment pathway functions across diverse systems.
OpenVLA is a 7-billion-parameter open vision-language-action model trained on 970,000 robotic manipulation sequences from Open X-Embodiment. Physical Intelligence addresses the action component through FAST, which transforms robot movement segments into tokens. Their openpi repository illustrates the work still required once a model is accessible. A team executes inference, calibrates using its own robot data, and then tests the outcome on the target hardware. Even this approach incurs hardware costs. The repository specifies over 8GB of GPU memory for inference, 22.5GB for LoRA fine-tuning, and 70GB for complete fine-tuning.
Submit your session idea for the 2026 RoboBusinessWhere transfer actually fails
A robotic workstation can pass validation and execute smoothly through most operations. The greater challenge lies in the occasional failures, where minor physical alterations create a different task than what the policy encountered during calibration.
At client locations, hardware transfer typically fails due to routine modifications. Camera positioning and gripper flexibility shift after final approval, fixture reference points move with the client’s workflow, and residue accumulates over weeks of operation before error recovery becomes inconsistent. Site drift represents the gap between the robot that passed validation and the robot functioning within the client’s actual process.
Domain randomization trains across numerous simulated variations, but the real world keeps presenting new ones daily. A command can maintain the same overall objective yet yield different results when contact follows an alternative force path. One side of a mechanism can transmit force through the structure differently, so a movement that succeeds in one direction can cause resistance, instability, or lost contact in the opposite. When this occurs, refining the command won’t correct behavior whose actual flaw lies in timing.
Hardware-aware models mitigate one aspect of the problem by representing a robot’s physical characteristics through kinematics, joint properties, prompts, or tokens. A policy that incorporates joint constraints and actuator behavior begins with a more accurate system description. Some unknowns become quantifiable parameters, but those measurements begin deteriorating as soon as the robot enters production. Friction shifts, tools degrade, and loads fluctuate with each process. Recovery movements can also generate conditions that the original calibration didn’t anticipate. Improved hardware models make deployments more traceable without making them overly generalized.
On an actual production line, the initial investigation is often straightforward. The team contrasts the last successful cycle with the failed one before questioning the policy. The discrepancy appears in positioning, power consumption, or the fixture reference point surrounding the task. The model might be generating precisely what it produced during validation testing, while the actual task has drifted from the data that trained it.
The valuable data emerges after the failure
Robotic data carries different requirements than language data. Bessemer Venture Partners has calculated the total global robot manipulation data at approximately 300,000 hours, compared with roughly 1 billion hours of online video and 300 trillion text tokens. Language models could harvest the internet. Robots must construct most of their knowledge base from operational machines.
NVIDIA is attempting to expand that knowledge base through another approach. They report that GR00T N1.7 was pre-trained on more than 20,000 hours of human first-person video rather than robot remote operation, wagering that human perspective footage contains valuable manipulation insights.
An equally critical component of the dataset is the failure context, encompassing the controller status, corrective action, and physical explanation. A camera might capture that the robot failed, but it might not clarify why the gripper dropped the component or why the safety shutdown activated. It might also overlook which corrective action restored the workstation to operation. Logs become problematic in another way when they disconnect from the physical occurrence. A log can track progress against a specific control metric while the robot is visibly struggling with the task. It can compile the numbers software prioritizes while generating performance that would dissatisfy a client. Logs prove their value only when the team can correlate them to what actually occurred in the workstation.
Remote operation and simulation can produce data before a system reaches the production floor, but the most valuable records come from equipped robots executing client processes with sufficient context to analyze failures afterward. A company that transforms error history into more reliable corrective movements gains more insight from each deployment than one that archives only successful operations.
Videos. The technician must distinguish a policy failure from a slipped tool, a shifted fixture, or a recovery path that made the next cycle worse.
Simulated futures meet reality
World models exist to evaluate decisions before any hardware is put at risk. World Labs’ Marble constructs 3D environments from text prompts or image inputs and exports them into formats suited for simulation and design review. In the autonomous driving space, Wayve’s GAIA-3 follows a comparable approach — a 15-billion-parameter world model built for realistic, controllable offline testing of self-driving AI.
World Action Models push world modeling closer to actual control. DreamZero describes the architecture as a model that forecasts future world states and actions from video. NVIDIA showcased GR00T N2 within that research direction, stating it handles novel tasks in unfamiliar environments more than twice as often as top VLA models and leads on the MolmoSpaces and RoboArena benchmarks. NVIDIA expects N2 to ship later this year.
Any generated action still has to pass through the controller before it becomes physical motion. Driving is bounded by road geometry and vehicle dynamics. Manipulation adds direct contact, and contact introduces failure modes that are far harder to replicate cleanly in simulation. Force closure can be off, seals degrade, and calibration can drift so gradually that the production line keeps running until it no longer produces consistent results.
Simulation gains real value when friction, actuator response, center of mass, and rate limits are measured rather than guessed. Even so, the team keeps the simulator calibrated against the actual hardware and monitors for the moment the real system has drifted past what the model represents. A well-calibrated simulator narrows the search space before anyone touches physical equipment, though it can never replace the validation of the system performing real tasks.
The controller is where the promise meets reality
Model output reaches the physical world through the controller. Agility Robotics has detailed a whole-body control model for Digit. The model is a compact LSTM with under 1 million parameters, trained in NVIDIA Isaac Sim for decades of simulated time across several days.
Many VLA policies operate at the pace of task-level actions or action chunks. A typical industrial servo loop closes around 1 kHz. A model output only becomes useful once the controller translates it into executable motion within the robot’s physical constraints. The motion architecture can determine the outcome before the controller ever rejects a command. A trajectory built from precise poses can still contain pauses or poor contact timing. In repetitive motion, continuous phase alignment can matter more than pose refinement, and a recovery maneuver that appears cautious in command space can arrive too late at the contact point.
Even a conservative post-processing step can introduce a new failure by shifting contact timing or pushing a recovery into a worse region of the robot’s local dynamics. Filtering can make the command appear smoother while placing the foot or tool late — much like the quadruped’s mirrored turn looked symmetrical in code but dragged at the point of contact. For industrial systems, the safety layer also defines what the learned layer is permitted to do when the model is uncertain or the machine state has shifted.
Reuters reported that Skild AI’s model would operate on Foxconn assembly lines in Houston, where NVIDIA Blackwell GPU server racks are manufactured. NVIDIA has also described Skild’s collaborations with ABB Robotics and Universal Robots. The test extends beyond a startup model on a single production line, because a shared intelligence layer must cover established robot portfolios without undermining the service and safety standards those platforms demand.
The open question is not solely which model performs best. It is whether reusability originates first from the action representation, the controller interface, the simulation pipeline, or the diagnostic trail.
What cannot be simply downloaded
A genuine robotics Llama moment would require a team to take a model, adapt it to its own action representation, validate it on its own robot, and deploy functional behavior without assistance from the original model’s developers. Driving has demonstrated that fleet learning can scale across many vehicles, with Waymo publishing safety analyses covering over 170 million fully autonomous miles. Manipulation faces more resistance because workcells and products vary far more than road geometry, and customer data rights restrict what can be collected and reused.
The platform that wins broad distribution could attempt to pool fault data across every installation, the way driving fleets accumulate miles. Robot manipulation data is difficult because each customer’s process is distinct, contracts restrict the data, and a fault on one line frequently does not transfer to the next. The dataset stays fragmented even when the model itself does not.
The Llama moment in robotics will not arrive the day a policy becomes downloadable. It will arrive the day another team can take that policy, adapt it to its robot, release it into a customer’s process, and still understand what went wrong weeks later when the line stops repeating.

About the author
Deepak Jayaraj serves as vice president of hardware engineering and manufacturing at Four Growers, an agricultural robotics company headquartered in Pittsburgh. With over 15 years of experience spanning space robotics, medical devices, and AgTech, he focuses on helping robotics companies navigate the critical shift from prototype to scaled deployment and on the economics of hardware business models.



