French AI startup Mistral AI has announced two new models—Leanstral, an open-source code agent built for formal verification, and Mistral Small 4, a unified multimodal reasoning model—as part of its push to expand open, production-ready AI systems.
Leanstral is positioned as the first open-source code agent designed specifically for Lean 4, targeting use cases where correctness must be formally proven rather than manually reviewed. The company said current coding agents face a scaling bottleneck due to human verification requirements in high-stakes domains.
“We envision a more helpful generation of coding agents to both carry out their tasks and formally prove their implementations against strict specifications,” Mistral said in its announcement.
Leanstral is designed to generate code and formally prove its correctness within Lean environments, operating across full proof engineering workflows rather than isolated tasks.
Both models are released under an Apache 2.0 license and are designed for open deployment across enterprise and developer environments.
The model uses a sparse architecture with six billion active parameters and supports parallel inference with Lean acting as a verifier, allowing it to balance performance and cost.
“Instead of debugging machine-generated logic, humans dictate what they want,” the company said.
Mistral is releasing Leanstral under an Apache License 2.0, alongside API access and integration into its Vibe coding environment, as well as a new evaluation benchmark, FLTEval, designed to test real-world proof-of-engineering tasks.
In benchmarking, Leanstral demonstrated efficiency gains over both open-source and proprietary models.
Against open-source systems, Leanstral-120B-A6B outperformed significantly larger models, exceeding GLM5-744B-A40B and Kimi-K2.5-1T-32B in a single pass, while Qwen3.5-397B-A17B required multiple passes to reach comparable scores. Leanstral achieved a score of 26.3 at pass@2 and scaled further to 29.3 at similar cost levels.
Compared with proprietary systems, Leanstral demonstrated a strong cost-performance advantage. At pass@2, it outperformed Claude Sonnet while costing $36 compared to $549, and at higher passes continued to extend that gap, though Claude Opus remained ahead on absolute quality at significantly higher cost.
Alongside Leanstral, Mistral introduced Mistral Small 4, a general-purpose model that combines reasoning, coding, and multimodal capabilities into a single system.
“With Small 4, users no longer need to choose between a fast instruct model, a powerful reasoning engine, or a multimodal assistant: one model now delivers all three,” the company said.
The model integrates capabilities from Mistral’s earlier systems and supports both text and image inputs, along with a configurable reasoning parameter to balance latency and depth.
Mistral Small 4 uses a mixture-of-experts architecture with 128 experts, with four active per token, and supports a context window of up to 2,56,000 tokens.
In internal evaluations, the model matched or exceeded GPT-OSS 120B across multiple benchmarks while generating shorter outputs. On AA LCR, it scored 0.72 using 1.6K characters, compared to Qwen models requiring significantly longer outputs for similar performance.
On LiveCodeBench, it outperformed GPT-OSS 120B while producing around 20% less output, which the company said translates into lower latency and reduced inference cost.
ALSO READ: NVIDIA’s 7-Chip Vera Rubin Platform in Full Production