A research study from the Samsung Advanced Institute of Technology AI Lab, Montreal, proposes a small AI model called Tiny Recursive Model (TRM).
TRM is a 7-million-parameter model, which achieved 45% accuracy on the ARC-AGI 1 benchmark. This benchmark assesses the performance of AI models on human-like, abstract, and visual reasoning tasks.
Notably, TRM scored higher than models such as Google’s Gemini 2.5 Pro (37%), OpenAI’s o3-mini-high (34.5%), and DeepSeek-R1 (15.8%). These models are significantly larger in terms of hundreds of billions of parameters.
On the ARC-AGI-2 benchmark, the latest and most challenging iteration, TRM achieved 7.8% accuracy, whereas the Gemini 2.5 Pro scored 4.9%, and o3-mini-high scored 3%.
Currently, xAI’s Grok 4 leads both the ARC-AGI 1 and 2 benchmarks with 66.7% and 16% accuracy, respectively.
Alexia Jolicoeur-Martineau, the author of the paper, confirmed on X that it took less than $500, and just 4 NVIDIA H-100 GPUs, and just two days to train the model. This is significantly less than what it takes to train large, billion-parameter general-purpose language models.
“Yes, it’s still possible to do cool stuff without a data centre,” said Sebastian Raschka, an AI research engineer, reacting to the cost efficiency on X.
Instead of relying on billions of parameters, TRM gets smarter by thinking in loops. Simply put, it begins with a rough answer, checks itself, and refines that answer through several incremental steps.
“This recursive process allows the model to progressively improve its answer (potentially addressing any errors from its previous answer) in an extremely parameter-efficient manner while minimising overfitting,” said the study.
The result is a proven indication of the thesis that with architectural innovations, small models can reason better on specific tasks than large ones. And aptly, the study is titled ‘Less is More’.
For more details on how the model was built, its improvements over the hierarchical reasoning model, and additional information on the evaluations, please refer to the full technical report here.
Several voices in the industry took to social media to react to the study, and many believe this can be a potentially huge AI breakthrough.
Deedy Das, partner at Menlo Ventures, said in a post on X, “Most AI companies today use general-purpose LLMs with prompting for tasks. For specific tasks, smaller models may not just be cheaper, but far higher quality!”.
He added that startups could train models for under $1000 for specific subtasks like PDF extraction or time series forecasting. These models would enhance the general model, boost performance, and help build IP for automation tasks.
ALSO READ: Databricks Launches Data Intelligence for Cybersecurity