Google DeepMind Unveils Genie 3

Google DeepMind has unveiled a new AI model called Genie 3, a world model capable of generating interactive 3D environments from just a single image. The model, trained without supervision or environment labels, allows users to control the character in a simulated world derived from the input image.

Users just need to enter a text prompt describing the environment that the model will then simulate in real time at 24 frames per second, maintaining consistency at 720p for a few minutes.

Genie 3 is designed as a world model that predicts future frames, rewards, and actions based on video data. Unlike previous models, Genie learns in an unsupervised way, trained purely on internet videos and associated actions, without labelled environments or supervision.

It can generalise to new visual inputs, generating interactive, controllable environments without fine-tuning. The training set includes 30 million video clips paired with action traces, making it one of the largest unsupervised datasets for world modelling to date.

“Genie 3 is the first real-time interactive general-purpose world model,” Shlomi Fruchter, a Research Director at DeepMind, said during a press briefing.

“It goes beyond narrow world models that existed before. It’s not specific to any particular environment. It can generate both photo-realistic and imaginary worlds, and everything in between.”

Users provide a single image, either drawn, rendered, or real-world. Genie 3 then:

Extracts the spatial layout from the image,
Uses its latent action model to understand possible movements,
Renders a dynamic, controllable 3D environment where the user can interact as if playing a side-scrolling video game.

According to DeepMind, the model supports motion and interaction via a built-in latent action representation that handles character movement based on user inputs. This allows the system to simulate physics, character control, and environment responses in real time.

Genie 3’s Pipeline includes:

Spatiotemporal Video Tokeniser: Converts video frames into discrete tokens for efficient learning.
Latent Action Model: Learns a compressed representation of actions from video-action pairs.
Dynamics Model: Predicts the next frames and states.
Renderer: Converts learned tokens back into realistic 3D-like frames.

All components were trained end-to-end, using data from open internet sources, without any game engine involvement. However, Genie 3 isn’t available for public preview yet and will be rolled out to a select group of creators for testing.

Join Our Core Community

2025 AI & Data Policy Overview: 22 Major Regulations That Shaped the Year

Relearning Work: Growing Human Potential in the AI Age

Big Tech’s Enterprise AI Initiatives in 2025: A Guide by Business Need

Onboarding AI Agents: 5 HR Principles That Apply Well

LLM Developers Building for Language Diversity in 2025

OpenAI DevDay 2025: Complete Breakdown of Key Announcements

Busting the 5 Biggest Myths About the EU Data Act

Data Act Unlocks the Physical World: Fintech’s Race to Monetise IoT Begins

EU Data Act Goes Live—Why Today Marks a Turning Point for Enterprise Strategy

AI’s Energy Crisis: Can Data Centres Keep Up With a World Demanding More Power?

SoftBank Completes Ampere Acquisition

OpenAI Launches ‘Shopping Research’ in ChatGPT

Microsoft Unveils Fara-7B Agentic Model

DHL Rolls Out AI Agents with HappyRobot to Automate Global Operations

NASA, Schmidt Sciences to Support Cornell Tech in Modernising arXiv

Google DeepMind Unveils Genie 3

Genie 3 supports motion and interaction via a built-in latent action representation that handles character movement based on user inputs, allowing the system to simulate physics, character control, and environment responses in real time.

SoftBank Completes Ampere Acquisition

OpenAI Launches ‘Shopping Research’ in ChatGPT

Unpack More

Google DeepMind Unveils SIMA 2

Adobe Expands GenStudio to Let Brands Build Their Own AI Models for Ads

New NVIDIA Models Helps Robots Learn, Reason, and Act in the Real World

OpenAI’s ChatGPT Introduces Parental Controls After 16-Year-Old’s Death

How AI is Finally Repealing Biology’s Most Expensive Law

Are Static Benchmarks for LLMs Giving a False Sense of Security?