Jun 03, 2026 11:57:00

Microsoft has announced seven AI models, including 'MAI-Thinking-1,' which has performance equivalent to Claude Sonnet 4.6, and the voice clone model 'MAI-Voice-2.'

On June 2, 2026, Microsoft announced seven of its proprietary AI models, including the inference model ' MAI-Thinking-1 ' and the compact coding model ' MAI-Code-1-Flash. ' Of the announced models, MAI-Thinking-1 is touted as having 'outperformed Anthropic's Claude Sonnet 4.6 in human evaluation.'

Building a hill-climbing machine: Launching seven new MAI models | Microsoft AI

https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/

Microsoft has announced the following seven AI models. The MAI-Image-2.5 model is a re-announcement of a model that was previously announced on May 26, 2026.

MAI-Thinking-1: A MoE model with 1 trillion total parameters and 35 billion active parameters.
MAI-Code-1-Flash: A compact model with a total of 5 billion parameters. Enables high-speed execution of coding tasks.
MAI-Image-2.5: Image generation model. Ranked 3rd in the world for image generation and 2nd in the world for image editing.
MAI-Image-2.5 Flash: A high-speed image generation model.
MAI Transcribe-1.5: A high-speed and highly accurate transcription model supporting 43 languages, including Japanese.
MAI-Voice-2: A speech synthesis model that supports 15 languages, including Japanese.
MAI-Voice-2 Flash: A high-speed speech synthesis model

◆MAI-Thinking-1
MAI-Thinking-1 is a MoE model with a total of 1 trillion parameters and 35 billion active parameters. The training data does not include AI-generated content and uses only clean data with appropriate licenses. Furthermore, self-sufficiency is ensured by utilizing in-house infrastructure such as Microsoft's AI chip ' Maia 200 '. In addition, it has been explicitly stated that no distillation from other companies' models was performed.

The table below shows the benchmark results for 'MAI-Thinking-1,' 'Claude Sonnet 4.6,' 'Claude Opus 4.6,' 'GPT-5.4,' 'Kimi K2.6,' 'DeepSeek V3.2,' 'DeepSeek V4,' and 'GLM-5.1.' MAI-Thinking-1 beat Claude Sonnet 4.6 in AIME 2025, which measures the ability to solve mathematical problems. Although it scored lower than Claude Sonnet 4.6 in other benchmark tests, Microsoft claims that 'in human evaluations of its ability to perform 1276 tasks, MAI-Thinking-1 was rated as higher performance than Claude Sonnet 4.6.'

MAI-Thinking-1 is now available in private preview on

Microsoft Foundry and will soon be available on MAI Playground .

◆MAI-Code-1-Flash
MAI-Code-1-Flash is a coding model with a total of 5 billion parameters. The graph below compares the benchmark scores of MAI-Code-1-Flash and Claude Haiku 4.5. MAI-Code-1-Flash consistently records higher scores than Claude Haiku 4.5.

MAI-Code-1-Flash will be gradually made available for Visual Studio Code and GitHub Copilot.

◆MAI-Image-2.5 and MAI-Image-2.5 Flash
MAI-Image-2.5 is an image generation AI that can generate high-quality images by highly inferring the subject, scene structure, lighting, size, and spatial relationships. It also excels at drawing characters within images according to instructions.

MAI-Image-2.5 is ranked 3rd in

the category of generating images from text and 2nd in the category of editing images on the AI ranking service 'Arena'.

Microsoft releases image generation AI 'MAI-Image-2.5,' boasting the world's third-best performance in generating images from text - GIGAZINE

MAI-Image-2.5 Flash is offered as a faster and more cost-effective model compared to MAI-Image-2.5.

MAI-Image-2.5 and MAI-Image-2.5 Flash are available via the Microsoft Foundry API. The API fees for MAI-Image-2.5 per million tokens are $5 (approx. 799 yen) for text input, $8 (approx. 1279 yen) for image input, and $47 (approx. 7512 yen) for image output. The API fees for MAI-Image-2.5 Flash per million tokens are $1.75 (approx. 280 yen) for text input, $1.75 (approx. 280 yen) for image input, and $19.50 (approx. 3117 yen) for image output.

◆MAI Transcribe-1.5
MAI Transcribe-1.5 is a transcription model that supports 43 languages, including Japanese. The graph below shows the results of an error rate test conducted by the third-party organization Artificial Analysis, where a smaller value on the vertical axis indicates higher accuracy in transcription. MAI Transcribe-1.5 has a lower error rate than its predecessor, MAI Transcribe-1, and is considered a highly accurate transcription model.

The graph below shows the error rate on the vertical axis and processing speed on the horizontal axis. MAI Transcribe-1.5 achieves both a low error rate and high processing speed.

MAI Transcribe-1.5 is available via API. Details of the API are available at the following link.

MAI-Transcribe in LLM Speech API - Speech Service - Foundry Tools | Microsoft Learn

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/mai-transcribe?pivots=ai-foundry

◆MAI-Voice-2 and MAI-Voice-2 Flash
MAI-Voice-2 is a speech synthesis model that supports 15 languages, including Japanese. By inputting human speech, it is possible to make it speak any word in the same voice. The graph below shows the results of a human evaluation of which was preferable: 'Speech synthesized by MAI-Voice-2 (dark red)' or 'Human recording (light red).' It can be seen that MAI-Voice-2 was evaluated as being equivalent to a real human.

MAI-Voice-2 is available via API. Details about the API can be found at the link below. Additionally, a low-cost and highly efficient MAI-Voice-2 Flash version will be available soon.

Build Multilingual TTS with MAI-Voice-2-Preview | Forgebook
https://microsoft-foundry.github.io/forgebook/notebook/mai-voice-2/

Related Posts:

Jun 03, 2026 11:57:00 in AI, Posted by log1o_hf