Chinese AI Startup MiniMax has launched MiniMax-M2.5, a new foundation model that the company claims delivers state-of-the-art performance in coding, search, tool use and office workflows, while lowering runtime costs for enterprise agents.
MiniMax said M2.5 scored 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp with context management.
It added that the model completed SWE-Bench Verified 37% faster than its predecessor, M2.1, with an average runtime of 22.8 minutes per task, comparable to Claude Opus 4.6.
“M2.5 is the first frontier model where users do not need to worry about cost, delivering on the promise of intelligence too cheap to meter,” the company said in a statement.
MiniMax said M2.5 consumed an average of 3.52 million tokens per SWE-Bench Verified task, compared with 3.72 million tokens for M2.1.
The company attributed the speed gains to higher inference throughput, more efficient task decomposition and parallel tool calling.
Over the past three and a half months, MiniMax has rolled out three iterations of its M2 series—M2, M2.1, and M2.5. The company said performance gains on benchmarks such as SWE-Bench Verified have progressed faster than rival model families, including Claude, GPT-5, and Gemini.
Pricing and Deployment
MiniMax released two versions: M2.5 and M2.5-Lightning. Both models have the same capabilities but differ in speed.
M2.5-Lightning runs at 100 tokens per second and costs $0.3 per million input tokens and $2.4 per million output tokens. The standard M2.5 version runs at 50 tokens per second and costs half as much. Both versions support caching.
“At a rate of 100 output tokens per second, running M2.5 continuously for an hour costs $1,” the company said. “At a rate of 50 TPS, the price drops to $0.3.”
MiniMax said the total cost per SWE-Bench Verified task is about 10% of Claude Opus 4.6.
Coding and Software Development
MiniMax said M2.5 was trained across more than 200,000 real-world environments and supports over 10 programming languages, including Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby.
The company said the model can handle the full software lifecycle, from system design and environment setup to feature iteration and code review. It also upgraded its internal VIBE benchmark to a Pro version to test more complex tasks, where M2.5 performed on par with Opus 4.5.
On SWE-Bench Verified tests conducted across different coding agent harnesses, MiniMax said M2.5 scored 79.7 on Droid, compared with 78.9 for Opus 4.6. On OpenCode, M2.5 scored 76.1, slightly ahead of Opus 4.6 at 75.9.
The company said the model has developed a “spec-writing tendency,” decomposing and planning features and structure before generating code.
Search and Tool Use
MiniMax said M2.5 achieved higher scores on BrowseComp and Wide Search benchmarks and showed improved generalisation across unfamiliar scaffolding environments.
To measure professional research tasks, the company built an internal benchmark called RISE (Realistic Interactive Search Evaluation). “The results show that M2.5 excels at expert-level search tasks in real-world settings,” the company said.
Office Workflows
MiniMax said it trained M2.5 with input from professionals in finance, law, and social sciences to improve deliverable quality in Word, PowerPoint, and Excel tasks.
The model has been deployed in MiniMax Agent, where Office Skills are automatically loaded based on file type. Users can combine these skills with domain-specific standards to create reusable “Experts” for research and financial modelling workflows.
MiniMax said more than 10,000 Experts have been built on the platform. It added that 30% of tasks inside the company are now completed autonomously by M2.5, and that the model generates 80% of newly committed code internally.
Reinforcement Learning and Infrastructure
MiniMax attributed the model’s gains to scaling reinforcement learning across hundreds of thousands of training environments derived from internal workflows.
The company built an agent-native RL framework called Forge to decouple training and inference engines from agent scaffolding. It said system optimisations, including asynchronous scheduling and a tree-structured sample merging strategy, delivered a 40x training speedup.
MiniMax also said it continued using its CISPO algorithm to stabilise large-scale mixture-of-experts training and introduced process rewards to address long-context credit assignment in agent rollouts.
ALSO READ: OpenAI Begins Testing Ads on Free and Go Plans in the US