As artificial intelligence steps out of the digital realm and into the real world, the race to build the embodied “brains” powering next-generation robots has become the newest battleground in tech competition between China and the United States.
Two days after US chip giant Nvidia launched its Cosmos 3 model – designed to help physical AI “think before it acts” – a Chinese start-up stole the spotlight.
On Wednesday, Hangzhou, Zhejiang province-based Spirit AI said its foundation model for embodied intelligence, Spirit v1.6, had become the first from China to top the RoboArena global leaderboard.
Spirit v1.6 scored 1,924 on the benchmark, edging out Nvidia’s Cosmos3-Nano-Policy, which took second place with a score of 1,881. Coming in third was DreamZero with a score of 1,763 – another Nvidia project unveiled in February.
The RoboArena benchmark, which evaluates how effectively generalist robot policies translate to real-world actions, was co-developed by Nvidia alongside elite institutions including Stanford University and University of California, Berkeley.
The fierce competition underscores a broader shift: robotics is officially AI’s next frontier. Nvidia’s partnerships with China’s Unitree Robotics and Singaporean robotic hand pioneer Sharpa announced on Monday also highlight this trend.
What is a physical AI model?
Unlike large language models (LLMs), which are built to process and generate text and code, a physical AI model allows machines – such as humanoids, robotic arms or autonomous vehicles – to perceive, understand and interact with the physical world.
Physical AI relies on two core capabilities. Policy capabilities are the model’s ability to take actions based on what it observes. This is the main metric measured by the RoboArena leaderboard.
The second one is world capabilities, which is the model’s ability to simulate and predict what will happen next if a specific action is taken.
While these functions are often developed separately, the industry is moving towards consolidation. Last September, Chinese researchers introduced a unified “Policy World Model” that integrates world modelling and trajectory planning into one architecture.
China’s dominance is not limited to policy models. The WorldArena benchmark, which evaluates embodied world models, is currently topped by WorldScape-0.2, developed by Chinese start-up Manifold AI. It beat out Nvidia’s Cosmos-Predict 2.5 (action) in the policy evaluator track.

Which other Chinese companies are top in the field?
Beyond policy and prediction, Chinese firms are currently leading other tracks within the WorldArena and broader evaluation ecosystems.
The perception track is led by Chinese robotics giant AgiBot with its GenieEnvisioner-Sim2.0-2B model, a video world simulator for robotic manipulation unveiled last week.
The data engine track is led by Chinese start-up DexForce’s DSCFuncWorld, which focuses on optimising the pipeline of training data, the company said on Tuesday.
Meanwhile, Manifold AI’s WorldScape-0.2 has claimed the top spot on the WorldScore benchmark, designed to evaluate a model’s ability to generate worlds from text prompts, outperforming WonderJourney, a joint project by Stanford and Google.
How much money is at stake?
The fast-growing physical AI industry is being fuelled by an unprecedented wave of venture capital.
On Wednesday, Spirit AI announced a 1.5bil yuan (US$222mil/RM901mil) financing round. This marks its fourth funding round in just three months – the most aggressive fundraising pace seen in the sector.
XYZ Embodied AI, incubated by the Beijing Academy of Artificial Intelligence, said on the same day that it closed its pre-A round this week, revealing it had amassed 1bil yuan (RM599mil) in just 10 months to develop “embodied brains” and world models.
Manifold AI, the benchmark-topping start-up, has completed five funding rounds in just 10 months, with its latest April round securing “hundreds of millions of yuan”, according to the company.
What is the ultimate bottleneck?
Despite the capital, the global industry faces a major hurdle.
For “robotic systems and physical AI, data is the hardest problem”, Nvidia CEO Jensen Huang said on Monday while announcing the firm’s partnership with Unitree.
However, China may hold a structural advantage. Alexandr Wang, founder of Scale AI who has since joined Meta Platforms, said last year that China was “fundamentally very well positioned on data”. He said many US companies “rely on data from China in training these robotics foundation models”.
In China, authorities in tech hubs like Beijing and Shenzhen have established state-backed “data factories” to collect robotics data. – South China Morning Post
