Tiny Model TRM from Samsung AI Lab Beats Gemini 2.5 Pro, o3-mini

A research study from the Samsung Advanced Institute of Technology AI Lab, Montreal, proposes a small AI model called Tiny Recursive Model (TRM).

TRM is a 7-million-parameter model, which achieved 45% accuracy on the ARC-AGI 1 benchmark. This benchmark assesses the performance of AI models on human-like, abstract, and visual reasoning tasks.

Notably, TRM scored higher than models such as Google’s Gemini 2.5 Pro (37%), OpenAI’s o3-mini-high (34.5%), and DeepSeek-R1 (15.8%). These models are significantly larger in terms of hundreds of billions of parameters.

On the ARC-AGI-2 benchmark, the latest and most challenging iteration, TRM achieved 7.8% accuracy, whereas the Gemini 2.5 Pro scored 4.9%, and o3-mini-high scored 3%.

Currently, xAI’s Grok 4 leads both the ARC-AGI 1 and 2 benchmarks with 66.7% and 16% accuracy, respectively.

Alexia Jolicoeur-Martineau, the author of the paper, confirmed on X that it took less than $500, and just 4 NVIDIA H-100 GPUs, and just two days to train the model. This is significantly less than what it takes to train large, billion-parameter general-purpose language models.

“Yes, it’s still possible to do cool stuff without a data centre,” said Sebastian Raschka, an AI research engineer, reacting to the cost efficiency on X.

Instead of relying on billions of parameters, TRM gets smarter by thinking in loops. Simply put, it begins with a rough answer, checks itself, and refines that answer through several incremental steps.

“This recursive process allows the model to progressively improve its answer (potentially addressing any errors from its previous answer) in an extremely parameter-efficient manner while minimising overfitting,” said the study.

The result is a proven indication of the thesis that with architectural innovations, small models can reason better on specific tasks than large ones. And aptly, the study is titled ‘Less is More’.

For more details on how the model was built, its improvements over the hierarchical reasoning model, and additional information on the evaluations, please refer to the full technical report here.

Several voices in the industry took to social media to react to the study, and many believe this can be a potentially huge AI breakthrough.

Deedy Das, partner at Menlo Ventures, said in a post on X, “Most AI companies today use general-purpose LLMs with prompting for tasks. For specific tasks, smaller models may not just be cheaper, but far higher quality!”.

He added that startups could train models for under $1000 for specific subtasks like PDF extraction or time series forecasting. These models would enhance the general model, boost performance, and help build IP for automation tasks.

ALSO READ: Databricks Launches Data Intelligence for Cybersecurity

Join Our Core Community

2025 AI & Data Policy Overview: 22 Major Regulations That Shaped the Year

Relearning Work: Growing Human Potential in the AI Age

Big Tech’s Enterprise AI Initiatives in 2025: A Guide by Business Need

Onboarding AI Agents: 5 HR Principles That Apply Well

LLM Developers Building for Language Diversity in 2025

OpenAI DevDay 2025: Complete Breakdown of Key Announcements

Busting the 5 Biggest Myths About the EU Data Act

Data Act Unlocks the Physical World: Fintech’s Race to Monetise IoT Begins

EU Data Act Goes Live—Why Today Marks a Turning Point for Enterprise Strategy

AI’s Energy Crisis: Can Data Centres Keep Up With a World Demanding More Power?

SoftBank Completes Ampere Acquisition

OpenAI Launches ‘Shopping Research’ in ChatGPT

Microsoft Unveils Fara-7B Agentic Model

DHL Rolls Out AI Agents with HappyRobot to Automate Global Operations

NASA, Schmidt Sciences to Support Cornell Tech in Modernising arXiv

Tiny Model TRM from Samsung AI Lab Beats Gemini 2.5 Pro, o3-mini

Instead of relying on billions of parameters, TRM gets smarter by thinking in loops.

SoftBank Completes Ampere Acquisition

OpenAI Launches ‘Shopping Research’ in ChatGPT

Unpack More

AI Models in Andrej Karpathy’s ‘LLM Council’ Rank GPT 5.1 as The Best

LLM Developers Building for Language Diversity in 2025

Adobe Unveils LLM Optimizer to Boost Brand Visibility in AI Search

Thinking Machines’ First Product Tinker Lets You Fine-Tune LLMs

How AI is Finally Repealing Biology’s Most Expensive Law

Are Static Benchmarks for LLMs Giving a False Sense of Security?