Google releases 'Gemini Deep Research Agent' and open-sources benchmark 'DeepSearchQA'

Google has announced that it will make its Gemini Deep Research agent, which significantly enhances the web search capabilities of its Gemini Deep Research platform, available via the Interaction API. Google has also open-sourced DeepSearchQA, a benchmark for evaluating deep search agents.
Build with Gemini Deep Research

Gemini Deep Research Agent | Gemini API | Google AI for Developers
Gemini Deep Research is an AI search feature optimized for long-term context collection and integration tasks, scheduled for launch in December 2024. It will also support Japanese from January 2025.
Google's AI search function 'Deep Research,' which can perform information gathering tasks that would take hours manually in just a few minutes, is now available in Japanese - GIGAZINE

According to Lucas Haas, product manager at Google DeepMind, the all-new Gemini Deep Research Agent uses Gemini 3 Pro, Google's most advanced factual model, as its inference core. It has been specially trained to reduce 'hallucinations' in complex tasks and maximize reporting quality. Furthermore, by extending multi-stage reinforcement learning for search, the agent can autonomously explore complex information environments with high accuracy.
The Gemini Deep Research agent has achieved cutting-edge results in the
Below are the scores for three benchmark tests performed on five models: Gemini Deep Research, Gemini 3 Pro, o4-mini deep research, o3-deep research, and GPT-5 Pro. You can see that Gemini Deep Research, shown in blue, outperforms Gemini 3 Pro, shown in light blue, and scores comparable to or better than GPT-5 Pro, shown in diagonal lines.

The Gemini Deep Research Agent is being released via the Interactions API , allowing developers to embed it directly into their apps, and will soon be available in Google Search, NotebookLM, Google Finance, and the Gemini app.
In addition, it has been decided to open-source 'DeepSearchQA,' the benchmark used in the study. DeepSearchQA is a benchmark for evaluating an agent's ability to perform complex, multi-step information search tasks. It implements 900 'causal chain' tasks across 17 domains, with each step relying on prior analysis. While previous benchmarks have been empirical, DeepSearchQA measures comprehensiveness by having the agent generate an exhaustive answer set, evaluating both search precision and search recall. It also functions to diagnose the effects of 'think time.'
Google DeepMind makes resources such as datasets, leaderboards, and technical reports publicly available to further research into building better agents.
Related Posts:
in AI, Posted by logc_nt






