Google’s New Deep Research Agent Scores SoTA Results on Benchmarks

Google also announced a new benchmark called DeepSearchQA, designed to test agent comprehensiveness on web research tasks.

Share

Google has announced the Gemini Deep Research agent, which is based on the Gemini 3 Pro model via the interactions API. This helps developers integrate autonomous research capabilities within their applications. 

“By scaling multi-step reinforcement learning for search, the agent autonomously navigates complex information landscapes with high accuracy,” the company said.

The tool plans its research iteratively, starting with formulating queries, reading sources, identifying gaps and searching again to fill them. “This release features vastly improved web search, allowing it to navigate deep into sites for specific data,” Google added.

On the Humanity’s Last Exam benchmark, which tests AI models on expert-level reasoning and problem-solving across a broad range of academic subjects, the Deep Research Agent scored 46.4%, outperforming OpenAI’s GPT-5 Pro (38.9%). 

Even on the BrowseComp benchmark, which evaluates LLM on locating ‘hard-to-find’ facts, Google’s DeepResearch agent scored 59.2%, only slightly below GPT-5 Pro (59.5%). 

Alongside the announcement, Google also announced a new benchmark called DeepSearchQA, designed to test agent comprehensiveness on web research tasks. Gemini Deep Research agent scored 66.1%, outperforming GPT-5 Pro (65.2%). 

DeepSearchQA features 900 hand-crafted tasks across 17 fields, where each step depends on prior analysis. “Unlike traditional fact-based tests, DeepSearchQA measures comprehensiveness, requiring agents to generate exhaustive answer sets. This assesses both research precision and retrieval recall,” Google said. 

Google said the agent will be soon available on Google Search, NotebookLM, Google Finance and the Gemini app. 

The API pricing matches the Gemini 3 Pro model: it costs $2 per million input tokens, while output tokens are priced at $12 per million for prompts up to 200,000 tokens and $18 per million for prompts that exceed that length.

ALSO READ: SpaceX Targets 2026 IPO to Raise Over $25 Billion: Report

Staff Writer
Staff Writer
The AI & Data Insider team works with a staff of in-house writers and industry experts.

Related

Unpack More