Google DeepMind Unveils SIMA 2

Google DeepMind has unveiled SIMA 2, the latest version of its Scalable Instructable Multiworld Agent that can reason, collaborate with users, and learn autonomously inside 3D virtual environments. The researchers described the release as “a milestone in creating general and helpful AI agents.”

SIMA 2 incorporates the Gemini model as its core, which allows the agent to interpret instructions, understand high-level goals, and describe its planned actions.

“SIMA 2 can do more than just respond to instructions, it can think and reason about them,” the company said. The earlier version, SIMA 1, had been trained to execute more than 600 basic skills across various commercial games.

Training for SIMA 2 used both human demonstrations and labels generated by Gemini. This approach allows the agent to explain what it intends to do and how it plans to complete a task. According to the team, interactions now “feel less like giving commands and more like collaborating with a companion who can reason about the task at hand.”

Testing showed improved generalisation, with SIMA 2 carrying out complex instructions and succeeding in games it had never encountered, including the Viking survival title ASKA and the research environment MineDojo.

The agent could also apply concepts learned in one game—such as mining—to comparable actions in other environments. Researchers noted that SIMA 2 has reduced much of the performance gap between AI and human players across evaluation tasks.

In another experiment, SIMA 2 was combined with Genie 3, a model that creates new 3D worlds from a single image or text prompt. The agent was able to orient itself and follow user instructions inside these automatically generated environments.

A key capability in the new system is self-improvement. After initial training on human demonstrations, SIMA 2 can shift to self-directed learning, using tasks and reward estimates generated by Gemini.

“This process allows the agent to improve on previously failed tasks entirely independently of human-generated demonstrations,” the company said. Data collected through this self-play is then used to train subsequent versions of the agent.

Google DeepMind noted remaining limitations, including difficulty with very long, multi-step tasks, short interaction memory, and precision challenges when controlling games through virtual keyboard and mouse inputs. Visual understanding of complex 3D scenes also remains an area for improvement.

The company said it is releasing SIMA 2 as a limited research preview for a small group of academics and game developers. “We remain deeply committed to developing SIMA 2 responsibly,” Google DeepMind noted, referencing its collaboration with internal responsible development experts.

Researchers said the work may eventually inform robotics, where skills such as navigation, tool use, and collaborative task execution are essential.

Meanwhile, World Labs, the startup founded by AI pioneer Fei-Fei Li, has released its generative world model, Marble, publicly available after a two-month beta with early users.

“Marble can create 3D worlds from text, images, video, or coarse 3D layouts,” World Labs said. “Users can interactively edit or expand worlds.”

ALSO READ: EU Data Act Goes Live—Why Today Marks a Turning Point for Enterprise Strategy

Join Our Core Community

LLM Developers Building for Language Diversity in 2025

UX Is the New Moat: Why AI Startups Win on Experience, Not Technology

Who Is Held Accountable When AI Agents Fail?

Agentic AI Market Correction: What’s Next for Enterprises?

Consolidation Comes to Agentic AI: Why that’s Good News for Enterprises

OpenAI DevDay 2025: Complete Breakdown of Key Announcements

Busting the 5 Biggest Myths About the EU Data Act

Data Act Unlocks the Physical World: Fintech’s Race to Monetise IoT Begins

EU Data Act Goes Live—Why Today Marks a Turning Point for Enterprise Strategy

AI’s Energy Crisis: Can Data Centres Keep Up With a World Demanding More Power?