Physical AI refers to AI systems that perceive, decide, and act in the physical world, including robots, autonomous vehicles, and industrial automation. Unlike software AI, mistakes in physical AI move metal and affect real people, equipment, and environments.

Which companies are leading in physical AI?

Key players include NVIDIA (platform layer with GR00T, Isaac, Omniverse), Figure AI (general-purpose humanoid), Tesla (Optimus), Physical Intelligence (robot foundation models), Waymo (autonomous vehicles at scale), Unitree (low-cost humanoids), Agility Robotics (warehouse logistics), 1X Technologies (consumer humanoid), Boston Dynamics (advanced locomotion), and Google DeepMind (VLA research).

How much does a humanoid robot cost?

Current mid-range humanoid robot BOM (bill of materials) runs $40-80K. Bank of America estimates ~$35K for China-sourced builds. The long-run mass-market target is $13-17K. Actuators alone account for 40-50% of total cost.

When will humanoid robots be available for home use?

Factory deployments are happening now. Warehouse robotics is expected by 2027, commercial settings by 2029, and household use by 2032 or later. The home is the hardest operating environment due to variability, safety requirements, and service complexity.

What are the main use cases for physical AI?

Near-term use cases include warehouse picking and transport (Amazon, DHL), factory cobots for assembly and inspection (Universal Robots, FANUC), hospital supply delivery, and autonomous vehicles (Waymo). Mid-term: agricultural harvesting, restaurant/retail service robots, construction assistance. Long-term: home assistance for elderly and disabled people, household chores, and general consumer robotics. The pattern is structured environments first, then progressively less controlled settings.

How will robots help elderly people?

Home robots for elderly care would detect falls (the leading cause of injury death over 65), remind about medication, fetch objects, and provide physical presence against isolation. This is among the most socially important use cases but also the hardest technically, expected around 2032 or later, because homes are unstructured environments with the highest safety requirements.

What is the NVIDIA physical AI stack?

NVIDIA offers an integrated physical AI stack: GR00T for robot foundation models, Cosmos for world models and synthetic data, Isaac Lab + Newton for simulation, Omniverse for digital twins, and Jetson Thor for on-robot edge compute. The stack is increasingly sold as one integrated path for humanoid and industrial robot development.

What is the sim-to-real gap in robotics?

The sim-to-real gap is the drop in performance when a robot policy trained in simulation is deployed in the real world. Typical numbers show 95% success rate in simulation dropping to around 60% in reality. This gap exists because simulation cannot perfectly model real-world physics, materials, lighting, and environmental variability.

Back to Physical AI

04 - TECHNOLOGY

How the stack fits together.

Models, simulation, edge compute, and cloud training are all part of the same system. The hard part is where they fail to meet cleanly.

Core concept

Vision-language-action models

A VLA takes images and task context as input, then outputs actions. The attraction is simpler software and broader generalization across tasks.

The unresolved question is reliability across new environments, tools, bodies, and failure cases.

The NVIDIA stack

Models, simulation, digital twins, and edge compute are increasingly being sold as one integrated path.

Foundation modelsGR00T

A general model layer for robots. If builders adopt it, more of robotics starts to look like a software platform problem.

World modelsCosmos

Synthetic data and scenario generation aimed at reducing the cost of learning from scarce physical traces.

SimulationIsaac Lab + Newton

Training and testing environments that try to move more work upstream before a robot reaches the field.

Digital twinsOmniverse

Virtual copies of factories, warehouses, and workcells for planning, layout testing, and operations.

Edge computeJetson Thor

On-robot compute for low-latency inference, perception, and safety-critical loops.

Robot foundation models

The model layer matters because it determines how much robotics can inherit the economics of software.

GR00T (NVIDIA)Shipping

VLA

The strongest ecosystem push so far around humanoid and general robot control.

pi0 (Physical Intelligence)Research

VLA

A bet that one policy family can generalize across many robot bodies and tasks.

RT-2 (Google)Research

Vision-language to action

An early proof that language-conditioned models can output robot actions rather than only descriptions.

Gemini RoboticsResearch

Multimodal

Google’s push toward stronger spatial reasoning and dexterity in the Gemini family.

Octo (Berkeley)Available

Open source

A generalist open policy trained on Open X-Embodiment data.

Gemma 4 (Google)Shipping

On-device

Small open models matter in robotics because many workloads cannot assume a persistent cloud connection.

Where the compute lives

The boundary between robot and cloud is mostly a question of latency, safety, and cost.

FunctionWhereWhy

Motor control loops (1-10ms)LOCALLatency is too tight for the network.

Obstacle avoidance / safetyLOCALSafety cannot depend on connectivity.

PerceptionLOCALReal-time scene understanding needs to happen on-device.

Task planningLOCAL + CLOUDEdge handles common cases; cloud helps when context is larger or novel.

Fleet learningCLOUDAggregation and training happen across many robots.

Heavy reasoningCLOUDLarge models still belong in the data center when latency allows it.

Reality gap

Simulation still overstates field performance.

95%

Success in sim

60%

Success in reality

That gap is the real technical problem. Better simulation matters only if it improves what survives once the robot reaches real people, inventory, and environments.

World models

Data is still scarce.

World models matter because they may let teams learn more from each unit of real-world data. They do not remove the need for verification.

World models

The data pipeline

Capture, simulate, expand, verify.

Real-world capture

Fleet sensors, teleoperation traces, and egocentric video record what actually happened in the field.

World models

Physics-aware models turn those traces into replayable environments and counterfactuals.

Synthetic expansion

Generated scenarios widen coverage before scarce robot-hours are spent in reality.

Verification

Reality remains the gate. Policies still need real-world validation before they earn deployment.

Back to Physical AI

04 - TECHNOLOGY

How the stack fits together.

Models, simulation, edge compute, and cloud training are all part of the same system. The hard part is where they fail to meet cleanly.

Core concept

Vision-language-action models

A VLA takes images and task context as input, then outputs actions. The attraction is simpler software and broader generalization across tasks.

The unresolved question is reliability across new environments, tools, bodies, and failure cases.

The NVIDIA stack

Models, simulation, digital twins, and edge compute are increasingly being sold as one integrated path.

Foundation modelsGR00T

A general model layer for robots. If builders adopt it, more of robotics starts to look like a software platform problem.

World modelsCosmos

Synthetic data and scenario generation aimed at reducing the cost of learning from scarce physical traces.

SimulationIsaac Lab + Newton

Training and testing environments that try to move more work upstream before a robot reaches the field.

Digital twinsOmniverse

Virtual copies of factories, warehouses, and workcells for planning, layout testing, and operations.

Edge computeJetson Thor

On-robot compute for low-latency inference, perception, and safety-critical loops.

Robot foundation models

The model layer matters because it determines how much robotics can inherit the economics of software.

GR00T (NVIDIA)Shipping

VLA

The strongest ecosystem push so far around humanoid and general robot control.

pi0 (Physical Intelligence)Research

VLA

A bet that one policy family can generalize across many robot bodies and tasks.

RT-2 (Google)Research

Vision-language to action

An early proof that language-conditioned models can output robot actions rather than only descriptions.

Gemini RoboticsResearch

Multimodal

Google’s push toward stronger spatial reasoning and dexterity in the Gemini family.

Octo (Berkeley)Available

Open source

A generalist open policy trained on Open X-Embodiment data.

Gemma 4 (Google)Shipping

On-device

Small open models matter in robotics because many workloads cannot assume a persistent cloud connection.

Where the compute lives

The boundary between robot and cloud is mostly a question of latency, safety, and cost.

FunctionWhereWhy

Motor control loops (1-10ms)LOCALLatency is too tight for the network.

Obstacle avoidance / safetyLOCALSafety cannot depend on connectivity.

PerceptionLOCALReal-time scene understanding needs to happen on-device.

Task planningLOCAL + CLOUDEdge handles common cases; cloud helps when context is larger or novel.

Fleet learningCLOUDAggregation and training happen across many robots.

Heavy reasoningCLOUDLarge models still belong in the data center when latency allows it.

Reality gap

Simulation still overstates field performance.

95%

Success in sim

60%

Success in reality

That gap is the real technical problem. Better simulation matters only if it improves what survives once the robot reaches real people, inventory, and environments.

World models

Data is still scarce.

World models matter because they may let teams learn more from each unit of real-world data. They do not remove the need for verification.

World models

The data pipeline

Capture, simulate, expand, verify.

Real-world capture

Fleet sensors, teleoperation traces, and egocentric video record what actually happened in the field.

World models

Physics-aware models turn those traces into replayable environments and counterfactuals.

Synthetic expansion

Generated scenarios widen coverage before scarce robot-hours are spent in reality.

Verification

Reality remains the gate. Policies still need real-world validation before they earn deployment.