Google DeepMind merges Street View data into Genie world model for robotics and simulation
The tech giant’s latest move aims to enhance spatial continuity for agents, but the model remains an experimental tool lacking physics awareness and photorealistic quality.

Google DeepMind has integrated Google Street View imagery into Project Genie, its general-purpose world model, to create immersive and interactive simulations of real-world environments. Announced at the Google I/O developer conference, the integration allows users to simulate weather changes and rare scenarios, such as snow in New York City or sunlight glinting off buildings in London. The technology aims to support robotics training, gaming, and educational experiences by providing spatial continuity and diverse environmental data.
The feature was announced at the Google I/O developer conference and is now available to Google AI Ultra subscribers in the United States, with a global rollout planned for the coming weeks. Jack Parker-Holder, a research scientist on DeepMind’s open-endedness team, described the capability as powerful for both agent and robotics use cases. He illustrated the utility with examples such as simulating scarce sunlight conditions in London to prevent sensory shock for new robots, or visualising snowy conditions in New York for seasonal planning.
The integration aims to support robotics training, gaming, and educational applications by allowing users to manipulate weather conditions and simulate rare scenarios, such as snow in New York City or sunlight reflecting off buildings in London. Google has collected more than 280 billion images across 110 countries and seven continents over the past 20 years. Parker-Holder noted that combining this rich source of real-world information with the ability to simulate worlds offers significant potential for understanding diverse environments.
Currently accessible to Google AI Ultra subscribers in the United States, the model is described as an experimental tool that lacks physics awareness and photorealistic quality, with researchers estimating it is six to twelve months behind current video generation models in terms of accuracy. Jonathan Herbert, director of Google Maps, highlighted that the real breakthrough lies in the AI’s spatial continuity. The model correctly remembers and simulates the environment behind a user when they turn 360 degrees, allowing it to build new environments on top of that anchored reality.
While the technology provides spatial continuity, allowing the AI to remember and simulate environments behind the user when they turn 360 degrees, it currently lacks physics awareness. In demonstrations, a simulated woman ran through cacti and bushes in a snowy Joshua Tree environment, indicating the model does not yet understand cause and effect. This contrasts with more advanced video generators like Veo, which intuitively learn physics through passive observation.
The model is described as an experiment that lacks physics awareness and photorealistic quality, with researchers estimating it is six to twelve months behind video generation models in accuracy. Parker-Holder acknowledged these limitations, stating that the team expects to solve the accuracy and quality gaps within the next six to twelve months. The goal is to put this new capability into as many hands as possible, per Diego Rivas, a product manager at DeepMind.
The technology aims to support robotics training, gaming, and educational experiences by providing spatial continuity and diverse environmental data. Genie 3 is already assisting Waymo in training its self-driving cars on exceedingly rare events. The addition of Street View data allows for shifting viewpoints to agents like humans or robots, rather than being limited to a car’s point of view, potentially helping Waymo prepare for launches in more cities globally.


