Microsoft has mostly relied on OpenAI’s large language models to power its own AI products like Copilot. But now, the tech giant’s AI division, led by Mustafa Suleyman, has announced two new AI models – MAI-1-preview and MAI-Voice-1.
In a blog post, Microsoft said that while MAI-1-preview “offers a glimpse of future offerings inside Copilot,” MAI-Voice-1 can generate a 60-second-long audio clip in just one second using a single GPU, making it one of the most efficient speech systems available to date.
The company is planning to bring MAI-Voice-1 to Copilot Labs, where users will be able to try it out with storytelling demos. It described the two AI models as following:
- MAI-Voice-1, Microsoft’s first highly expressive and natural speech generation model, is available in Copilot Daily and Podcasts, and as a brand new Copilot Labs experience to try out. Voice is the interface of the future for AI companions and MAI-Voice-1 delivers high-fidelity, expressive audio across both single and multi-speaker scenarios.
- The company has begun public testing of MAI-1-preview on LMArena, a popular platform for community model evaluation. This represents MAI’s first foundation model trained end-to-end and offers a glimpse of future offerings inside Copilot.
MAI-1-preview was trained using approximately 15,000 NVIDIA H100 GPUs. Microsoft says that the AI model specialises “in following instructions and providing helpful responses to everyday queries.”
In the following weeks, the AI model will be available in certain text-based use cases. The number of GPUs used to train Microsoft’s newest model is far less than the likes of xAI’s Grok, which took more than 1,00,000 of these chips just for training.