Google releases 'Gemini Deep Research Agent' and open-sources benchmark 'DeepSearchQA'



Google has announced that it will make its Gemini Deep Research agent, which significantly enhances the web search capabilities of its Gemini Deep Research platform, available via the Interaction API. Google has also open-sourced DeepSearchQA, a benchmark for evaluating deep search agents.

Build with Gemini Deep Research

https://blog.google/technology/developers/deep-research-agent-gemini-api/



Gemini Deep Research Agent | Gemini API | Google AI for Developers

https://ai.google.dev/gemini-api/docs/deep-research

Gemini Deep Research is an AI search feature optimized for long-term context collection and integration tasks, scheduled for launch in December 2024. It will also support Japanese from January 2025.

Google's AI search function 'Deep Research,' which can perform information gathering tasks that would take hours manually in just a few minutes, is now available in Japanese - GIGAZINE



According to Lucas Haas, product manager at Google DeepMind, the all-new Gemini Deep Research Agent uses Gemini 3 Pro, Google's most advanced factual model, as its inference core. It has been specially trained to reduce 'hallucinations' in complex tasks and maximize reporting quality. Furthermore, by extending multi-stage reinforcement learning for search, the agent can autonomously explore complex information environments with high accuracy.

The Gemini Deep Research agent has achieved cutting-edge results in the

Humanity's Last Exam (HLE) , the most challenging AI test, and DeepSearchQA, a benchmark for web search tasks. It also achieved the best performance for Google on OpenAI's challenging benchmark BrowseComp .

Below are the scores for three benchmark tests performed on five models: Gemini Deep Research, Gemini 3 Pro, o4-mini deep research, o3-deep research, and GPT-5 Pro. You can see that Gemini Deep Research, shown in blue, outperforms Gemini 3 Pro, shown in light blue, and scores comparable to or better than GPT-5 Pro, shown in diagonal lines.



The Gemini Deep Research Agent is being released via the Interactions API , allowing developers to embed it directly into their apps, and will soon be available in Google Search, NotebookLM, Google Finance, and the Gemini app.

In addition, it has been decided to open-source 'DeepSearchQA,' the benchmark used in the study. DeepSearchQA is a benchmark for evaluating an agent's ability to perform complex, multi-step information search tasks. It implements 900 'causal chain' tasks across 17 domains, with each step relying on prior analysis. While previous benchmarks have been empirical, DeepSearchQA measures comprehensiveness by having the agent generate an exhaustive answer set, evaluating both search precision and search recall. It also functions to diagnose the effects of 'think time.'

Google DeepMind makes resources such as datasets, leaderboards, and technical reports publicly available to further research into building better agents.

in AI, Posted by logc_nt