Google’s New Deep Research Agent Scores SoTA Results on Benchmarks

Google has announced the Gemini Deep Research agent, which is based on the Gemini 3 Pro model via the interactions API. This helps developers integrate autonomous research capabilities within their applications.

“By scaling multi-step reinforcement learning for search, the agent autonomously navigates complex information landscapes with high accuracy,” the company said.

The tool plans its research iteratively, starting with formulating queries, reading sources, identifying gaps and searching again to fill them. “This release features vastly improved web search, allowing it to navigate deep into sites for specific data,” Google added.

On the Humanity’s Last Exam benchmark, which tests AI models on expert-level reasoning and problem-solving across a broad range of academic subjects, the Deep Research Agent scored 46.4%, outperforming OpenAI’s GPT-5 Pro (38.9%).

Even on the BrowseComp benchmark, which evaluates LLM on locating ‘hard-to-find’ facts, Google’s DeepResearch agent scored 59.2%, only slightly below GPT-5 Pro (59.5%).

Alongside the announcement, Google also announced a new benchmark called DeepSearchQA, designed to test agent comprehensiveness on web research tasks. Gemini Deep Research agent scored 66.1%, outperforming GPT-5 Pro (65.2%).

DeepSearchQA features 900 hand-crafted tasks across 17 fields, where each step depends on prior analysis. “Unlike traditional fact-based tests, DeepSearchQA measures comprehensiveness, requiring agents to generate exhaustive answer sets. This assesses both research precision and retrieval recall,” Google said.

Google said the agent will be soon available on Google Search, NotebookLM, Google Finance and the Gemini app.

The API pricing matches the Gemini 3 Pro model: it costs $2 per million input tokens, while output tokens are priced at $12 per million for prompts up to 200,000 tokens and $18 per million for prompts that exceed that length.

ALSO READ: SpaceX Targets 2026 IPO to Raise Over $25 Billion: Report

Join Our Core Community

Agentic AI in Production: Why Better Prompts Won’t Bridge the Gap

NVIDIA’s VP of Solutions Architecture on What It Actually Takes to Build a Sovereign AI Factory

NVIDIA GTC 2026: From GPUs to AI Factories

Speed Without Guardrails: The Security Gap Enterprises Are Creating as They Scale AI Agents

Regulation Actioned: Inside Corlytics’ Approach to Responsible RegTech

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

“Synthetic Data Is Not the Ground Truth” — SandboxAQ’s VP of Engineering on Simulation’s Power and Limits

Data as the New Diagnostic: How Ahead Health is Turning Algorithms Into Preventive Care

Why Data Leaders Are Wary of a Synthetic Future

Google Lets Users Bring ChatGPT & Claude History Into Gemini

Google Launches Gemini 3.1 Flash Live Voice Model

Anthropic Wins Injunction Against Pentagon in Landmark AI Safety Showdown

Claude Code Can Now Write Directly in Figma Canvas

Yann LeCun Builds LeWorldModel, an AI System Running on a Single GPU

Google’s New Deep Research Agent Scores SoTA Results on Benchmarks

Google also announced a new benchmark called DeepSearchQA, designed to test agent comprehensiveness on web research tasks.

Google Lets Users Bring ChatGPT & Claude History Into Gemini

Google Launches Gemini 3.1 Flash Live Voice Model

Unpack More

Google Lets Users Bring ChatGPT & Claude History Into Gemini

Google Launches Gemini 3.1 Flash Live Voice Model

Google, Accel Pick 5 AI Startups for 2026 Atoms Cohort

Google Finalises Wiz’s Acquisition Deal

Why Data Leaders Are Wary of a Synthetic Future

What Everyone Got Wrong About AI in 2025

AI & Data Insider’s Contributors’ Circle: Meet 2025’s Leading Voices