OpenAI and Anthropic are going head-to-head this week, with both companies releasing new coding models on the same day, pushing AI to move beyond code generation toward long-running, agentic work on computers.
Anthropic, on February 5, unveiled Claude Opus 4.6, an upgrade to its Opus-class model that introduces stronger coding and agentic capabilities, along with a one-million token context window in beta.
“We’re upgrading our smartest model,” Anthropic said in its blog, adding that Opus 4.6 “plans more carefully, sustains agentic tasks for longer, and operates more reliably in larger codebases.”
Claude Opus 4.6 is available on claude.ai, through Anthropic’s API, and across major cloud platforms. Developers can access the model using Claude Opus 4.6 via the Claude API. Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, the company said.
Meanwhile, OpenAI announced GPT-5.3-Codex, which it described as its “most capable agentic coding model to date,” expanding Codex from a code-writing assistant into a system capable of performing a wide range of professional tasks on a computer.
“GPT-5.3-Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer,” the company said in a blog.
The model is available to users on paid ChatGPT plans across all Codex surfaces, including the app, command-line interface, IDE extensions, and the web. OpenAI said it is working to enable API access to the model at a later date, subject to safety reviews.
Anthropic said Claude Opus 4.6 addresses what it described as “context rot,” where model quality declines as conversations grow longer. “Opus 4.6 is much better at retrieving relevant information from large sets of documents. This extends to long-context tasks, where it holds and tracks information over hundreds of thousands of tokens with less drift,” the company said.
“Opus 4.6 can also apply its improved abilities to a range of everyday work tasks: running financial analyses, doing research, and using and creating documents, spreadsheets, and presentations. Within Cowork, where Claude can multitask autonomously, Opus 4.6 can put all these skills to work on your behalf,” Anthropic added in its post.
On benchmarks like Terminal-Bench 2.0, OSWorld, and BrowseComp, Opus 4.6 beat GPT-5.2 and Gemini 3 Pro on agentic coding and terminal coding.
Similarly, according to OpenAI, GPT-5.3-Codex combines the coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2, while running 25% faster. The company said this allows the model to handle long-running tasks involving research, tool use, and complex execution while remaining interactive.
OpenAI said GPT-5.3-Codex achieved the highest scores on SWE-Bench Pro and Terminal-Bench, while also showing strong performance on OSWorld and GDPval.
Beyond coding, OpenAI said GPT-5.3-Codex supports work across the software lifecycle, including debugging, deployment, monitoring, writing product documents, editing copy, user research, and data analysis.
OpenAI said GPT-5.3-Codex was also used internally during its own development. “The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations—our team was blown away by how much Codex was able to accelerate its own development,” the company said.
Anthropic also introduced updates to its Claude developer platform, including adaptive thinking, which allows the model to decide when deeper reasoning is needed; new effort controls to balance intelligence, speed, and cost; and context compaction in beta, which summarises older context to support longer-running tasks.
Moreover, the company added that it has added new features across Claude and Claude Code that allow knowledge workers and developers to handle more complex tasks using familiar tools. As part of the update, the company introduced agent teams in Claude Code as a research preview, enabling users to run multiple agents in parallel that coordinate autonomously.
Anthropic said the feature is designed for tasks that can be split into independent, read-heavy work, such as codebase reviews, and allows users to take control of individual subagents directly.
ALSO READ: Data as the New Diagnostic: How Ahead Health is Turning Algorithms Into Preventive Care