Positronic Robotics evaluated 4 VLA fashions on bin-to-bin order selecting. | Credit score: Positronic Robotics
Positronic Robotics, which mentioned it helps builders make robots work with synthetic intelligence, has launched its “Physical AI Leaderboard,” or PhAIL. It’s an ongoing, benchmark evaluating robotics basis fashions on industrial duties.
Based in September 2025, Positronic mentioned it has developed an open-source infrastructure to standardize and scale bodily AI by bridging the hole between analysis basis fashions and real-world robotic manufacturing. The Springfield, Mo.-based firm‘s system makes use of a unified Python toolkit for all the robotics lifecycle and the PhAIL benchmark.
PhAIL evaluates fashions on bodily robotic setups performing commercially related operations. Positronic Robotics has began with bin-to-bin order selecting — probably the most widespread duties in logistics and industrial automation. On this job, objects are transferred one after the other from an inbound container to an outbound container.
The present analysis rig makes use of a Franka Analysis 3 robotic arm paired with a Robotiq 2F-85 gripper in DROID-style configuration, a extensively used and reproducible analysis platform.
PhAIL measures throughput and reliability
Bodily AI has superior quickly in recent times, with basis fashions able to dealing with more and more numerous manipulation duties. However most benchmarks nonetheless depend on simulation or managed laboratory situations, and plenty of public evaluations emphasize curated demonstration movies reasonably than sustained operation. For industrial deployment, two variables dominate: throughput and reliability.
PhAIL measures each immediately. Every run is executed on actual {hardware}, not in simulation. Mannequin checkpoints are chosen randomly and evaluated in blinded situations. Each run is logged and printed with synchronized video, robotic telemetry, station metadata, and scoring artifacts.
From these runs, PhAIL computes items per hour (UPH), and imply time between failures or assists (MTBF/A) – the identical metrics an operations supervisor would use to guage a deployment, reasonably than a tutorial “success rate.” The protocol is totally documented within the PhAIL white paper.
The Bodily AI Leaderboard itself is hardware-agnostic. Positronic Robotics mentioned it plans so as to add robotic embodiments in Q2 2026 to replicate the variety of real-world deployments. Bin-to-bin selecting is barely the start line, it mentioned. The benchmark’s aim is to measure how properly AI fashions carry out on repetitive, economically vital operations that happen hundreds of instances per day in actual services.
“We all dream about a robot that folds our laundry – but that’s a task that happens once a day. In factories and logistics, the same operation runs hundreds of times per shift, and most of those still aren’t solved,” mentioned Sergey Arkhangelskiy, founding father of Positronic Robotics. “Physical AI needs to prove itself there first, and PhAIL is how we measure whether it can.”
Positronic Robotics evaluates fashions
Within the inaugural evaluations, 4 fashions have been fine-tuned and examined: OpenPI 0.5 from Bodily Intelligence, GR00T from NVIDIA, SmolVLA from HuggingFace/LeRobot, and ACT from LeRobot – in addition to teleoperated and human baselines. The outcomes present a measurable hole between present basis fashions and human-level efficiency in each throughput and reliability on industrial selecting duties.
Positronic Robotics described it as calibration — a clear baseline that permits progress to be measured constantly over time. As new fashions are launched, they are often evaluated beneath the identical protocol, making a steady, comparable report of efficiency, it mentioned.
The corporate asserted that PhAIL targets three structural points within the bodily AI ecosystem:
- Lack of goal measurement of business readiness. Most public metrics don’t replicate factory-floor constraints.
- Unclear return-on-investment (ROI) alerts for operators. Success charges don’t translate immediately into deployment selections.
- A damaged suggestions loop for mannequin builders. With out standardized, auditable benchmarks, it’s tough to iterate towards real-world reliability.
By publishing synchronized video, logs, firmware variations, {hardware} configuration, and scoring artifacts for each run, PhAIL emphasizes auditability and reproducibility, mentioned Positronic Robotics.
It launched PhAIL as a ruled consortium reasonably than as a proprietary product. Nebius, which offers an AI cloud basis for the robotics lifecycle, has joined as a founding consortium companion. Toloka participates as a knowledge companion supporting analysis processes. Positronic Robotics famous that the benchmark is meant as a shared business yardstick, not as a aggressive advertising and marketing automobile.
“Scaling physical AI requires a clear, shared standard for production readiness,” mentioned Evan Helda, head of bodily AI at Nebius. “With no established blueprint for deploying these methods at scale, the PhAIL Leaderboard delivers an vital benchmark grounded in real-world efficiency knowledge—bringing higher transparency to what’s prepared for deployment.”
“Nebius is committed to accelerating physical AI development across the ecosystem,” he added. “By our participation within the PhAIL consortium, we’re proud to assist advance the following part of business robotics alongside business companions.”
The PhAIL dataset and fine-tuning scripts are publicly out there. Mannequin builders can fine-tune their methods and submit checkpoints for analysis. {Hardware} distributors can validate mannequin efficiency throughout embodiments. Operators can assessment printed artifacts immediately.
Catch the most recent in bodily AI on the Robotics Summit & Expo
Registration is now open for the Robotics Summit & Expo, the world’s main technical occasion for industrial robotics builders. The occasion is produced by The Robotic Report and WTWH Media.
The present could have greater than 50 periods in tracks on synthetic intelligence, design and growth, enabling applied sciences, healthcare, and logistics. The Engineering Theater on the present ground will even function displays by business consultants.
Greater than 70 audio system are confirmed from corporations reminiscent of AWS, Mind Corp, Fictiv, Harmonic Drive, maxon, PickNik Robotics, RealSense, the Robotics and AI Institute, Strong AI, Tesla, Toyota Analysis Institute, and extra.
The Robotics Summit will even function quite a few networking alternatives. They embrace a Combine & Mingle Networking Reception after the primary day of the present and the ticketed RBR50 Awards Dinner.
The Robotics Summit & Expo is co-located with DeviceTalks Boston, which focuses on medical gadgets.

The put up PhAIL ranks high robotics basis fashions on actual {hardware} appeared first on The Robotic Report.



