After the launch of Anthropic’s new Claude Sonnet 4.6 model, the company has now also released its new web search and web fetch tools.
The tools enable Claude to write and execute code during web searches to filter and refine results before they reach the end users in the context window. This is said to improve its accuracy and token efficiency.
“Agents using basic web search tools need to make a query, pull search results into context, fetch full HTML files from multiple websites, and reason over it all before responding,” said Anthropic in a blog post. “But the context being pulled in from search is often irrelevant, which degrades the quality of the response.”
The new web search and web fetch tools can dynamically filter search results instead of processing full HTML files, providing only relevant information to the end user while discarding the rest.
In the BrowseComp evaluation, which tests whether an AI model can navigate a wide range of websites to find information that is deliberately hard to find, dynamic filtering improves the accuracy of Claude Sonnet 4.6 and Opus 4.6. It increases the score of Sonnet 4.6 from 33.3% to 46.6% and Opus 4.6 from 45.3% to 61.6%.
Performance improvements were also observed on the DeepSearchQA benchmark, which tests whether an agent can systematically plan and execute multi-step searches without missing any answers.
The feature will be enabled by default when using the new web search and web fetch tools with Sonnet 4.6 and Opus 4.6 on the Claude API.
Users and developers on social media showered praise on this particular update.
Nick Dobos, an engineer at The Browser Company, wrote on X, “This unlocks crazy amounts of complex function calling.”
“For example, say you are querying a database. Previously, you would do one query, then Claude would read that result and then query again if needed,” he explained. “Now Claude writes code to call the tool, then that code can handle the result and do different things, like query again, strip or format data, and change what it’s doing based on the tool call result, all before being sent back to Claude. The code that Claude writes pre-plans how to react to the tool result.”
He explained that this compresses LLM agent loops, as the agent isn’t making decisions on the fly, and it doesn’t need to repeatedly ask the LLM to make decisions. “Instead, the LLM pre-bakes potentially hundreds or thousands of decision paths.”
“I would not be surprised if we see eventually 2x-100x improvements or more on agent loop and tool calling efficiency scores from this design,” he added, calling it a “subtle but absolutely huge change”.