Mistral Releases Leanstral and Small 4 Models

Share

French AI startup Mistral AI has announced two new models—Leanstral, an open-source code agent built for formal verification, and Mistral Small 4, a unified multimodal reasoning model—as part of its push to expand open, production-ready AI systems.

Leanstral is positioned as the first open-source code agent designed specifically for Lean 4, targeting use cases where correctness must be formally proven rather than manually reviewed. The company said current coding agents face a scaling bottleneck due to human verification requirements in high-stakes domains.

“We envision a more helpful generation of coding agents to both carry out their tasks and formally prove their implementations against strict specifications,” Mistral said in its announcement.

Leanstral is designed to generate code and formally prove its correctness within Lean environments, operating across full proof engineering workflows rather than isolated tasks.

Both models are released under an Apache 2.0 license and are designed for open deployment across enterprise and developer environments.

The model uses a sparse architecture with six billion active parameters and supports parallel inference with Lean acting as a verifier, allowing it to balance performance and cost.

“Instead of debugging machine-generated logic, humans dictate what they want,” the company said.

Mistral is releasing Leanstral under an Apache License 2.0, alongside API access and integration into its Vibe coding environment, as well as a new evaluation benchmark, FLTEval, designed to test real-world proof-of-engineering tasks.

In benchmarking, Leanstral demonstrated efficiency gains over both open-source and proprietary models.

Against open-source systems, Leanstral-120B-A6B outperformed significantly larger models, exceeding GLM5-744B-A40B and Kimi-K2.5-1T-32B in a single pass, while Qwen3.5-397B-A17B required multiple passes to reach comparable scores. Leanstral achieved a score of 26.3 at pass@2 and scaled further to 29.3 at similar cost levels.

Compared with proprietary systems, Leanstral demonstrated a strong cost-performance advantage. At pass@2, it outperformed Claude Sonnet while costing $36 compared to $549, and at higher passes continued to extend that gap, though Claude Opus remained ahead on absolute quality at significantly higher cost.

Alongside Leanstral, Mistral introduced Mistral Small 4, a general-purpose model that combines reasoning, coding, and multimodal capabilities into a single system.

“With Small 4, users no longer need to choose between a fast instruct model, a powerful reasoning engine, or a multimodal assistant: one model now delivers all three,” the company said.

The model integrates capabilities from Mistral’s earlier systems and supports both text and image inputs, along with a configurable reasoning parameter to balance latency and depth.

Mistral Small 4 uses a mixture-of-experts architecture with 128 experts, with four active per token, and supports a context window of up to 2,56,000 tokens.

In internal evaluations, the model matched or exceeded GPT-OSS 120B across multiple benchmarks while generating shorter outputs. On AA LCR, it scored 0.72 using 1.6K characters, compared to Qwen models requiring significantly longer outputs for similar performance.

On LiveCodeBench, it outperformed GPT-OSS 120B while producing around 20% less output, which the company said translates into lower latency and reduced inference cost.

ALSO READ: NVIDIA’s 7-Chip Vera Rubin Platform in Full Production

Staff Writer

The AI & Data Insider team works with a staff of in-house writers and industry experts.

Join Our Core Community

‘Do Not Fall in Love with the Prompt’: Ellis Crosby on Building Outcome‑First AI

Claude Mythos Has Changed the Vulnerability Curve. Can Defenders Keep Up?

Dr Lee Schlenker’s Playbook For Boards Before EU AI Act’s Enforcement

What Google’s Agentic Gemini Era Really Means For CIOs And CDOs

Alteryx Wants AI to Build Inside Governance, Not Around It

Unstructured Data, Deterministic Answers

Data Layer Precedes Compute, GPU Capacity in Sovereign AI

Why Data Reliability Now Governs Scaling GenAI

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

TCS Partners Mistral to Build Custom Enterprise AI Models at Scale

Snowflake, AWS Sign $6Bn Deal for Enterprise Agentic AI

OpenAI’s New Tax AI Tool Processed 7,000 Returns With ‘Up to 97% Accuracy’

OpenAI Commits $250Mn to Research on Economic Disruption, Labour Market Changes

GPT-5.5 Beats Claude and Gemini in New Coding Benchmark DeepSWE

Mistral Releases Leanstral and Small 4 Models

In benchmarking, Leanstral demonstrated efficiency gains over both open-source and proprietary models.

TCS Partners Mistral to Build Custom Enterprise AI Models at Scale

Snowflake, AWS Sign $6Bn Deal for Enterprise Agentic AI

Unpack More

TCS Partners Mistral to Build Custom Enterprise AI Models at Scale

Mistral AI Raises $830 Mn in Debt for its Sovereign AI Infra

Mistral Unveils Its Most Powerful Open Model Yet with Mistral 3

Why Data Reliability Now Governs Scaling GenAI

Middle East: The Sovereign AI Testbed US, EU and Asia Can Learn From

NVIDIA’s VP of Solutions Architecture on What It Actually Takes to Build a Sovereign AI Factory