OpenAI has released GPT-5.1-Codex-Max, a new agentic coding model designed for long-running software development tasks, and made it available across all Codex surfaces.
The model is built on an updated reasoning foundation and is trained on agentic tasks across software engineering, math, research, and more, according to OpenAI. It is the company’s first system trained to operate across multiple context windows through a process called compaction, enabling it to maintain coherence over millions of tokens during a single task.
OpenAI said the model can independently run for hours, adding that internal tests saw Codex-Max “persistently iterate on its implementation, fix test failures, and ultimately deliver a successful result” over tasks that ran for more than 24 hours.
The new model is accessible to users on ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. Developers using Codex CLI via API key will receive access when API support rolls out. GPT-5.1-Codex-Max will now replace GPT-5.1-Codex as the default in all Codex interfaces.
OpenAI said 95% of its internal engineering team uses Codex weekly and that engineers “ship roughly 70% more pull requests since adopting Codex.”
Higher Accuracy and Better Token Efficiency
GPT-5.1-Codex-Max outperforms previous versions on several real-world and benchmark coding evaluations. On SWE-Lancer, it reached 79.9% accuracy, compared with 66.3% for GPT-5.1-Codex. On SWE-bench Verified, Codex-Max achieved higher accuracy at the same reasoning level while using 30% fewer thinking tokens.
OpenAI said the efficiency gains translate to lower costs for developers. In one example, the model generated a full browser-based CartPole reinforcement learning sandbox requiring 27,000 thinking tokens, compared to 37,000 for the earlier Codex model.
The company is also introducing a new extra-high reasoning mode for non-latency-sensitive tasks, which allows the model to think longer before producing output.
Long-Horizon Work and Windows Support
Because of compaction, GPT-5.1-Codex-Max can handle complex refactors, multi-hour debugging, and extended agent loops that previously failed due to context limits. It is also the first Codex model trained to operate inside Windows environments. The system now includes tasks specifically designed to improve collaboration inside the Codex CLI.
Safeguards and Cybersecurity
OpenAI said GPT-5.1-Codex-Max “does not reach High capability on Cybersecurity” under its Preparedness Framework, but is the most capable cybersecurity model the company has deployed so far.
OpenAI said it is preparing additional safeguards as agentic capabilities evolve, noting that it has already disrupted attempts to misuse its models in cyber operations.
Codex runs in a restricted sandbox by default, with limited file access and no network connectivity unless explicitly enabled. OpenAI recommends keeping these limits in place to avoid prompt-injection risks.
“Codex should be treated as an additional reviewer and not a replacement for human reviews,” the company said, adding that developers should examine all generated changes before deployment.
ALSO READ: EU Data Act Goes Live—Why Today Marks a Turning Point for Enterprise Strategy
