AI coding agents lower barriers to robotics through 'code as policy'
Researchers from UC Berkeley, Nvidia, and other institutions argue that advanced coding models are bridging the gap between reliability and generalisation, potentially making robotics accessible to non-experts.

A recent demonstration involving the OpenClaw AI agent controlling a LeRobot 101 robotic arm has highlighted the practical application of the 'code as policy' approach. The user successfully deployed the agent to calibrate hardware, write Python scripts for object recognition, and assist in training a model to manipulate items. This development underscores a broader industry shift where AI-powered coding is simplifying the configuration and training of robotic systems, reducing the technical expertise previously required to operate such machinery.
The LeRobot 101 is part of an open-source project from HuggingFace, featuring a controller arm and a follower arm equipped with a camera. Prior to utilising OpenClaw, the user spent several hours attempting to manually connect and calibrate the device, nearly damaging the motors by applying incorrect settings that caused overheating. With assistance from OpenClaw and Codex, the user was able to rapidly configure connections and calibrate joint positions, eventually writing a script that enabled the robot to identify and grip a red ball.
This workflow aligns with the 'code as policy' paradigm, first introduced in a 2022 research paper. The approach posits that AI coding skills can bridge the gap between conventional engineering methods, which are reliable but lack generalisation, and contemporary vision-language-action models, which generalise but are not yet fully reliable. Ken Goldberg, a roboticist at UC Berkeley, noted that this method has the potential to make robotics accessible to nearly anyone, describing it as a critical unlock for the technology's integration into society.
To measure the capabilities of coding models in this domain, researchers from UC Berkeley, Nvidia, Carnegie Mellon University, and Stanford developed a benchmark called CaP-X. The benchmark indicates that Gemini is currently the most effective model for programming robots, a result attributed to Google DeepMind’s focus on multimodal training. The same research group also released CaP-Gym, an environment for coding agents to control simulated and real robots, and CaP-Agent0, an agentic framework that has been shown to boost coding model performance on manipulation tasks.
Spencer Huang, son of Nvidia CEO Jensen Huang, is involved in organising hackathons for 'vibe coding robots' and is collaborating with Goldberg on a research project to expand the compatibility of the code-as-policy approach with more robot software tools. Huang stated that enabling people to control robots through spoken or typed commands represents the true holy grail of the field, potentially allowing non-experts to build and deploy complex robotic systems with minimal friction.


