Microsoft on January 26 unveiled Maia 200, a custom AI accelerator aimed at lowering the cost of large-scale AI inference, positioning the chip against rival offerings from Google and Amazon as competition among hyperscalers intensifies.
It is a custom-built AI accelerator for large-scale inference workloads that delivers higher performance per dollar for AI token generation and will be deployed across Microsoft’s cloud infrastructure.
Maia 200 is fabricated on TSMC’s 3-nanometer process and includes native FP8 and FP4 tensor cores, 216GB of HBM3e memory delivering 7 TB/s of bandwidth, and 272MB of on-chip SRAM. Microsoft said the chip provides more than 10 petaFLOPS of FP4 performance and over 5 petaFLOPS of FP8 performance within a 750-watt thermal envelope.
According to Microsoft, Maia 200 delivers three times the FP4 performance of the third-generation Amazon Trainium and FP8 performance above Google’s seventh-generation TPU.
The company added that the accelerator offers 30% better performance per dollar than the latest generation hardware in our fleet today.
Maia 200 will support multiple models, including the latest OpenAI GPT-5.2 models, and will be used across Microsoft Azure, Microsoft Foundry and Microsoft 365 Copilot. Microsoft’s Superintelligence team will also use the chip for synthetic data generation and reinforcement learning to improve in-house models.
“For synthetic data pipeline use cases, Maia 200’s design helps accelerate the rate at which high-quality, domain-specific data can be generated and filtered,” the company said.
The accelerator is currently deployed in Microsoft’s US Central datacenter region near Des Moines, Iowa, with the US West 3 region near Phoenix, Arizona, scheduled next. Additional regions will follow, the company said.
At the system level, Maia 200 uses a two-tier scale-up network based on Ethernet, rather than proprietary interconnects. Each accelerator provides 2.8 TB/s of bidirectional scale-up bandwidth and supports collective operations across clusters of up to 6,144 accelerators. Within a server tray, four Maia chips are directly connected to reduce latency and improve inference efficiency.
Microsoft said Maia 200 was developed using a pre-silicon co-design approach that modelled large language model workloads early, allowing faster deployment once silicon became available.
Alongside the hardware, Microsoft announced a preview of the Maia software development kit, which includes PyTorch support, a Triton compiler, optimised kernels, a low-level programming language, and simulation and cost optimisation tools.
“Our Maia AI accelerator program is designed to be multi-generational,” the company said, adding that future versions are already in development as Microsoft expands Maia 200 across its global infrastructure.
ALSO READ: GitHub Introduces Copilot SDK to Embed AI Agents in Applications