Google DeepMind Unveils SIMA 2

After initial training on human demonstrations, SIMA 2 can shift to self-directed learning, using tasks and reward estimates generated by Gemini.

Share

Google DeepMind has unveiled SIMA 2, the latest version of its Scalable Instructable Multiworld Agent that can reason, collaborate with users, and learn autonomously inside 3D virtual environments. The researchers described the release as “a milestone in creating general and helpful AI agents.”

SIMA 2 incorporates the Gemini model as its core, which allows the agent to interpret instructions, understand high-level goals, and describe its planned actions.

“SIMA 2 can do more than just respond to instructions, it can think and reason about them,” the company said. The earlier version, SIMA 1, had been trained to execute more than 600 basic skills across various commercial games.

Training for SIMA 2 used both human demonstrations and labels generated by Gemini. This approach allows the agent to explain what it intends to do and how it plans to complete a task. According to the team, interactions now “feel less like giving commands and more like collaborating with a companion who can reason about the task at hand.”

Testing showed improved generalisation, with SIMA 2 carrying out complex instructions and succeeding in games it had never encountered, including the Viking survival title ASKA and the research environment MineDojo. 

The agent could also apply concepts learned in one game—such as mining—to comparable actions in other environments. Researchers noted that SIMA 2 has reduced much of the performance gap between AI and human players across evaluation tasks.

In another experiment, SIMA 2 was combined with Genie 3, a model that creates new 3D worlds from a single image or text prompt. The agent was able to orient itself and follow user instructions inside these automatically generated environments.

A key capability in the new system is self-improvement. After initial training on human demonstrations, SIMA 2 can shift to self-directed learning, using tasks and reward estimates generated by Gemini. 

“This process allows the agent to improve on previously failed tasks entirely independently of human-generated demonstrations,” the company said. Data collected through this self-play is then used to train subsequent versions of the agent.

Google DeepMind noted remaining limitations, including difficulty with very long, multi-step tasks, short interaction memory, and precision challenges when controlling games through virtual keyboard and mouse inputs. Visual understanding of complex 3D scenes also remains an area for improvement.

The company said it is releasing SIMA 2 as a limited research preview for a small group of academics and game developers. “We remain deeply committed to developing SIMA 2 responsibly,” Google DeepMind noted, referencing its collaboration with internal responsible development experts. 

Researchers said the work may eventually inform robotics, where skills such as navigation, tool use, and collaborative task execution are essential.

Meanwhile, World Labs, the startup founded by AI pioneer Fei-Fei Li, has released its generative world model, Marble, publicly available after a two-month beta with early users. 

“Marble can create 3D worlds from text, images, video, or coarse 3D layouts,” World Labs said. “Users can interactively edit or expand worlds.”

ALSO READ: EU Data Act Goes Live—Why Today Marks a Turning Point for Enterprise Strategy

Staff Writer
Staff Writer
The AI & Data Insider team works with a staff of in-house writers and industry experts.

Related

Unpack More