Yann LeCun and a team of researchers from NYU, MILA University, and Brown University have introduced LeWorldModel (LeWM), a new AI system that can learn and plan from raw pixels while running on a single GPU. The paper is co-authored by Lucas Maes, Quentin Le Lidec, Damien Scieur, and Randall Balestriero.
The study presents a model that can learn directly from raw visual data and make predictions about future states of an environment, without relying on complex training tricks or massive compute.
Unlike many modern AI systems that rely on massive infrastructure, LeWorldModel is efficient. The model has around 15 million parameters and can be trained on a single GPU within a few hours.
It also allows for significantly faster decision-making. According to the paper, LeWorldModel can plan up to 48 times faster than some existing world models while maintaining competitive performance.
This could make such systems more practical for real-world applications like robotics and autonomous agents.
The release also reflects the direction of LeCun’s ongoing work, including efforts around Joint Embedding Predictive Architecture-based systems and his push for more efficient, physics-aware AI.
LeCun recently launched Advanced Machine Intelligence (AMI) Labs, a startup building world models that can understand and predict the physical world. The company has reportedly raised over $1 billion as it bets on a new direction for AI beyond text generation.
World models are systems that help AI predict how environments evolve and are seen as key to building more capable and autonomous agents. However, existing methods are often unstable, complex, and require multiple training hacks to work effectively.
LeCun and his collaborators propose a different approach.
LeWorldModel uses a streamlined design with just two loss functions, compared to several in earlier systems. This reduces complexity and improves stability during training.
“Existing methods remain fragile, relying on complex multi-term losses… to avoid representation collapse,” the paper noted.
By contrast, the new model avoids such issues with a simpler setup that still prevents collapse, where AI models produce meaningless outputs.
Unlike many AI models that rely on rewards or labelled data, LeWorldModel learns in a reward-free, task-agnostic setting. It trains purely on sequences of images and actions, aiming to understand how environments change over time.
The system builds a compact internal representation, called a latent space, and predicts future states based on it.
Beyond performance, the researchers say the model shows early signs of understanding basic physical properties.
LeCun has long argued that building AI systems that understand the world, rather than just generate text or images, is critical for the next phase of AI.
This work aligns with that vision by showing that simpler, more efficient architectures can still learn useful world models.
However, the researchers acknowledge limitations. Current models still struggle with long-term planning and depend on large datasets for training.
ALSO READ: Cloudflare Releases New Sandboxing API