Microsoft Bets Big on AI Inference with Maia 200

Microsoft on January 26 unveiled Maia 200, a custom AI accelerator aimed at lowering the cost of large-scale AI inference, positioning the chip against rival offerings from Google and Amazon as competition among hyperscalers intensifies.

It is a custom-built AI accelerator for large-scale inference workloads that delivers higher performance per dollar for AI token generation and will be deployed across Microsoft’s cloud infrastructure.

Maia 200 is fabricated on TSMC’s 3-nanometer process and includes native FP8 and FP4 tensor cores, 216GB of HBM3e memory delivering 7 TB/s of bandwidth, and 272MB of on-chip SRAM. Microsoft said the chip provides more than 10 petaFLOPS of FP4 performance and over 5 petaFLOPS of FP8 performance within a 750-watt thermal envelope.

According to Microsoft, Maia 200 delivers three times the FP4 performance of the third-generation Amazon Trainium and FP8 performance above Google’s seventh-generation TPU.

The company added that the accelerator offers 30% better performance per dollar than the latest generation hardware in our fleet today.

Maia 200 will support multiple models, including the latest OpenAI GPT-5.2 models, and will be used across Microsoft Azure, Microsoft Foundry and Microsoft 365 Copilot. Microsoft’s Superintelligence team will also use the chip for synthetic data generation and reinforcement learning to improve in-house models.

“For synthetic data pipeline use cases, Maia 200’s design helps accelerate the rate at which high-quality, domain-specific data can be generated and filtered,” the company said.

The accelerator is currently deployed in Microsoft’s US Central datacenter region near Des Moines, Iowa, with the US West 3 region near Phoenix, Arizona, scheduled next. Additional regions will follow, the company said.

At the system level, Maia 200 uses a two-tier scale-up network based on Ethernet, rather than proprietary interconnects. Each accelerator provides 2.8 TB/s of bidirectional scale-up bandwidth and supports collective operations across clusters of up to 6,144 accelerators. Within a server tray, four Maia chips are directly connected to reduce latency and improve inference efficiency.

Microsoft said Maia 200 was developed using a pre-silicon co-design approach that modelled large language model workloads early, allowing faster deployment once silicon became available.

Alongside the hardware, Microsoft announced a preview of the Maia software development kit, which includes PyTorch support, a Triton compiler, optimised kernels, a low-level programming language, and simulation and cost optimisation tools.

“Our Maia AI accelerator program is designed to be multi-generational,” the company said, adding that future versions are already in development as Microsoft expands Maia 200 across its global infrastructure.

ALSO READ: GitHub Introduces Copilot SDK to Embed AI Agents in Applications

Join Our Core Community

De-Risking the Crypto Portfolio: How AI Offers CFOs Control in a 24/7 Market

6 Enterprise Tests to Expose Hidden AI Compliance Risks Across Borders

Forward-Looking Technical Debt: The Hidden Cost of AI Hesitation

Why AI Governance Is Becoming a Board-Level Issue for Multinationals

Intent: The Missing Data Layer in Generative AI

Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

“Synthetic Data Is Not the Ground Truth” — SandboxAQ’s VP of Engineering on Simulation’s Power and Limits

Data as the New Diagnostic: How Ahead Health is Turning Algorithms Into Preventive Care

Why Data Leaders Are Wary of a Synthetic Future

Is Your Enterprise Data Stack Ready for Agentic AI? 10 Signs to Check

Canva Boosts Creative, Ad Tech with New Acquisitions

Microsoft Expands Sovereign Cloud Portfolio

AMD, Meta Sign Multi-Year Deal to Deploy Up to 6 Gigawatts of AI GPUs

OpenAI Partners with BCG, McKinsey, Accenture & Capgemini to Drive Enterprise AI Adoption

Anthropic Unveils AI Fluency Index to Measure Human–AI Collaboration

Microsoft Bets Big on AI Inference with Maia 200

Maia 200 delivers three times the FP4 performance of the third-generation Amazon Trainium and FP8 performance above Google’s seventh-generation TPU.

Canva Boosts Creative, Ad Tech with New Acquisitions

Microsoft Expands Sovereign Cloud Portfolio

Unpack More

Microsoft Expands Sovereign Cloud Portfolio

Microsoft Unveils Copilot Checkout with PayPal

Microsoft Unveils Fara-7B Agentic Model

Microsoft, NVIDIA to Invest Up to $15 Bn in Anthropic

Why Data Leaders Are Wary of a Synthetic Future

What Everyone Got Wrong About AI in 2025

AI & Data Insider’s Contributors’ Circle: Meet 2025’s Leading Voices