Google announces next-generation inference AI model 'Gemini 2.5', greatly improving inference and coding performance

Google has announced its next-generation inference AI ' Gemini 2.5 ' series, and has announced that it will release the multimodal model Gemini 2.5 Pro Experimental as the first of its series. Google is promoting Gemini 2.5 Pro Experimental, which has powerful inference and code capabilities, as the 'most intelligent model.'
Gemini 2.5: Our newest Gemini model with thinking
Conventional large-scale language models have difficulty with complex tasks that require logical reasoning, such as mathematical problems and coding. However, inference models use additional computing power and time to check facts and reason about the problem before giving an answer, enabling them to output highly accurate results even in mathematical problems and coding.
Since OpenAI announced its first AI inference model, ' OpenAI o1 ' in September 2024, AI companies have been competing to achieve inference capabilities comparable to or superior to OpenAI o1 with their own models. At the time of writing, in addition to OpenAI, Anthropic, xAI, DeepSeek and others are developing inference models.
Google has also developed inference models and released its first inference model, Gemini 2.0 Flash Thinking, in December 2024. This is a model that adds the ability to generate thought processes to the multimodal model Gemini 2.0 Flash .
Google releases 'Gemini 2.0 Flash Thinking', an AI model that introduces a thought process and enhances inference, and exceeds OpenAI's o1-preview and GPT-4o in various tests - GIGAZINE

Google says that the newly announced Gemini 2.5 series has enhanced reasoning and coding capabilities over Gemini 2.0 Flash Thinking.
Below is a summary of the benchmark results published by Google. In 'Reasoning & knowledge', 'Science', and 'Mathematics', it recorded top-class scores compared to OpenAI o3-mini, OpenAI GPT-4.5, Claude 3.7 Sonnet, Grok 3 Beta, and DeepSeek-R1.

According to Google, Gemini 2.5 Pro Experimental scored 68.6% on Aider Polyglot , an evaluation tool that measures code editing, beating OpenAI o3-mini, Claude 3.7 Sonnet, and DeepSeek-R1. Gemini 2.5 Pro Experimental also scored 63.8% on SWE-Bench Verified , a software development benchmark for AI, beating OpenAI o3-mini and DeepSeek-R1. However, Anthropic's Claude 3.7 Sonnet, which scored 70.3%, was slightly better than Gemini 2.5 Pro Experimental.
In the following movie, Google shows how Gemini 2.5 Pro Experimental actually generates a simple game by entering a one-line prompt.
Gemini 2.5: Create your own dinosaur game from a single line prompt - YouTube
Gemini 2.5 Pro has 1 million input tokens and 64,000 output tokens at the time of writing, and will be available on the developer platform Google AI Studio, as well as on the Gemini app for subscribers to the Gemini Advanced plan, which costs 2,900 yen per month. The cost of using the API is unknown at the time of writing, but Google said it will announce details soon.
Simon Willison, a developer who tests various AIs, conducted his own testing and, although he noted that he had only just scratched the surface, he rated the Gemini 2.5 Pro's accuracy in text interpretation, image recognition, and voice recognition. 'The Gemini 2.5 Pro is a very powerful new model,' he wrote in his blog .
Related Posts: