Effective robotics demands more than just datasets. Source: Erika AI, via Adobe Stock
Artificial intelligence is rapidly shifting focus from text-based chatbots and image recognition toward systems that can interact with the physical world, such as autonomous vehicles and industrial robots. Although significant progress has been made in training these technologies through vast datasets and computer simulations, a fundamental challenge persists: the disconnect between what a machine perceives and the complexities of reality.
An advanced ability to reason is useless without a precise understanding of the current physical conditions.
The shift from Physical AI 1.0 to 2.0
Today’s market is primarily defined by “physical AI 1.0.” This current phase emphasizes scale, utilizing enormous libraries of visual and textual data alongside highly realistic simulations—like NVIDIA’s Cosmos environment—to train machines before they are deployed.
However, this initial iteration suffers from a “vision-first” limitation. It relies on the assumption that high-resolution cameras and powerful processors are sufficient to predict outcomes. In practice, however, cameras can suffer from glare, objects are obscured by shadows, and sensors often deliver contradictory or “noisy” information.
The next generation, “Physical AI 2.0,” adds a vital component to the technological framework: physics-based state reconstruction.
The difference is crucial because the measure of success in physical AI is no longer limited to the algorithm itself. In purely digital contexts, the algorithm is often the final product.
In robotics, however, the algorithm must function seamlessly with hardware sensors, simulation tools, policy training, process orchestration, safety protocols, edge computing, and real-world feedback. A robot that fails to accurately interpret its surroundings cannot “think” its way out of a dangerous situation.
Share your insights and speak at the 2026 RoboBusiness ConferenceBuilding the framework for movement
To operate safely in real-world conditions, a robotic system must execute a continuous cycle of four key functions:
- Predictive modeling: This provides “priors”—knowledge derived from previous experience and digital simulations about what events are likely to occur.
- State reconstruction: This is the crucial “missing link.” It takes raw, imperfect sensor data and accurately reconstructs the actual state of the environment. It transforms a rough estimate of a person’s location into a precise calculation of their path through a crowded space.
- Decision-making engines: Once the state is established, the AI evaluates the situation. It assesses risks, weighs options, and formulates the safest course of action, such as deciding whether to stop or proceed with caution.
- Execution: The final stage where the system performs a physical movement within carefully defined safety limits.
The quality of an AI’s reasoning is directly tied to the accuracy of its environmental perception. If the underlying perception is flawed, even a sophisticated decision-making system might make an “accurate” but dangerously wrong choice.
It is important to note that these systems influence control but do not directly trigger movements. In well-designed systems, the AI provides intent, goals, and safety limits, while the robot’s internal planning and control systems use those inputs to execute a specific, safe motion.
AI only becomes truly “physical” when it influences the real world through motion, creating a new set of environmental data for the next sequence of actions.
Why additional data isn’t a silver bullet
The “end-to-end” AI philosophy suggests that if we simply build larger models, machines will eventually learn to filter through bad sensor data on their own.
However, a specialized reconstruction layer offers a much more effective approach. By creating a dedicated module for physical state reconstruction, developers can leverage specialized hardware such as radar or tactile sensors to enhance “observability” before the primary AI even begins processing. This eliminates the need for every new robot to relearn the basics of physics from scratch.
While challenges are often categorized as “edge cases,” many are actually poorly observed cases. Standardized tests can show when a system fails at rare events like partial occlusions or unexpected behavior.
But knowing a scenario is difficult does not mean the system can recover the missing sensor data. A camera can capture more frames, and a model can spend more time categorizing them, but if the sensor data is fundamentally compromised, the AI’s reasoning will still be based on false premises.
In such situations, the answer is simply more data. It requires a robust reconstruction layer using physics-based logic and advanced sensors to reveal what was originally hidden.
Real-world applications: From laundry to driving
| Feature | Humanoid robot performing chores | Self-driving car in an urban setting |
|---|---|---|
| Predictive Modeling | Anticipating how different fabrics drape and fold | Simulating traffic patterns during a downpour |
| State Reconstruction | Mapping the exact shape of clothing despite folds, shadows, or limited visibility | Keeping track of a cyclist obscured by a parked vehicle in a complex scene |
| Decision-making | Choosing to re-fold, adjust grip, pause, or seek assistance | Determining whether to give way, stop, advance slightly, or reroute |
| Execution | Carefully folding a garment sleeve | Performing a controlled, safe turn or lane change |
The bottom line for future AI: Perception is paramount
The future of AI isn’t solely about creating “smarter” reasoning engines; it’s about building systems that are “better” at perceiving reality. The leaders in the AI space will be those that master the connection between digital forecasts and actual physical truths.
Visual and linguistic processing provides a foundation, but for AI to truly function in the real world, it needs a more reliable understanding of the environment it operates within.
Ultimately, what remains unseen can pose a greater risk than what is clearly visible.
Meet the Author
Dr. Behrooz Rezvani is a seasoned serial entrepreneur, engineer, and systems architect known for transforming complex mathematical theories into marketable platforms and products. He established Ikanos Communications, a company that revolutionized high-speed broadband connectivity, which was later purchased by Qualcomm Atheros.
He also co-founded Quantenna Communications, a dominant force in Wi-Fi chips, which was acquired by ON Semiconductor for roughly $1.07 billion. Currently, he serves as the founder and CEO of Atomathic, a company dedicated to developing the mathematical and inference infrastructure for physical AI to “uncover the invisible for defense, autonomous systems, robotics, aviation, and smart machinery,” supported by investors including RTX Ventures and GM Ventures.




Meet the Author