Kimi K2 Thinking Crushes GPT-5, Claude 4.5 Sonnet in Key Benchmarks

Moonshot AI, a Chinese startup backed by Alibaba, released its latest AI model, Kimi K2 Thinking, on November 6. The model surpassed several leading AI systems, including OpenAI’s GPT-5 and Claude Sonnet 4.5, in key reasoning and coding benchmarks.

https://twitter.com/Kimi_Moonshot/status/1986449512538513505

Moonshot said the model’s architecture activates 32 billion parameters per inference out of a total of one trillion parameters and supports up to 2,56,000 token context windows.

The model can execute 200 to 300 sequential tool calls without human intervention.

Benchmark results show that Kimi K2 Thinking achieved scores of 44.9% on the Humanity’s Last Exam benchmark (with tools enabled), 60.2% on the BrowseComp web-search reasoning benchmark and 71.3% on SWE-bench Verified, which evaluate agentic reasoning and coding capabilities.

Moonshot said Kimi K2 Thinking is designed for explicit reasoning, with intermediate logical steps visible in its outputs to ensure transparency across multi-step workflows.

Despite its trillion-parameter scale, Moonshot AI explained that Kimi K2 Thinking maintains a modest runtime cost. The company lists pricing at $0.15 per one million tokens for cache hits, $0.60 per one million tokens for cache misses and $2.50 per one million tokens for output.

These rates are competitive even against MiniMax-M2’s $0.30 input and $1.20 output pricing, and remain an order of magnitude lower than GPT-5, which is priced at $1.25 for input and $10 for output.

The open-source model is available under a Modified MIT License, permitting free commercial use with one attribution condition for high-scale deployments.

The launch of Kimi K2 Thinking comes at a time when Chinese open-source AI firms are competing more closely with US proprietary systems. Moonshot AI views the model as a crucial step toward making powerful AI technology more accessible.

ALSO READ: EU Data Act Goes Live—Why Today Marks a Turning Point for Enterprise Strategy

Join Our Core Community

CEOs, AI and the New Burden of Knowing Enough

Why Data Sovereignty Is Becoming an Enterprise AI Control Problem

This Startup Went from a Team of 20 to 6. Yet, Humans are their Most Valued Asset.

From Generic Models to Living Twins: A Practitioner’s Guide to ML in Design Workflows

Designing AI‑Ready Public Infrastructure: Global Lessons from India’s Aadhaar Builder

Banks Are Drowning in Data and Starving for Insight

Unstructured Data, Deterministic Answers

Data Layer Precedes Compute, GPU Capacity in Sovereign AI

Why Data Reliability Now Governs Scaling GenAI

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

OpenAI Launches ChatGPT Work Powered by GPT-5.6 for Enterprise Workflows

MiniMax Announces New $2 Bn Funding

Meta Launches Muse Spark 1.1 Challenges GPT-5.5 & Opus 4.8

Father of Reinforcement Learning Richard Sutton Launches New AI Startup

SpaceXAI Launches Grok 4.5

Kimi K2 Thinking Crushes GPT-5, Claude 4.5 Sonnet in Key Benchmarks

Moonshot said Kimi K2 Thinking is designed for explicit reasoning, with intermediate logical steps visible in its outputs to ensure transparency across multi-step workflows.

OpenAI Launches ChatGPT Work Powered by GPT-5.6 for Enterprise Workflows

MiniMax Announces New $2 Bn Funding

Unpack More

Moonshot AI Launches Kimi K2.5 With Vision-Based Coding, Agent Swarms

Sam Altman Unveils GPT-5 Updates for ChatGPT Users

Elon Musk’s xAI Makes Grok 4 Free for All Users

OpenAI Unveils GPT-5 AI Model

Why Data Reliability Now Governs Scaling GenAI

Middle East: The Sovereign AI Testbed US, EU and Asia Can Learn From

NVIDIA’s VP of Solutions Architecture on What It Actually Takes to Build a Sovereign AI Factory