OpenAI’s GPT-5.1-Codex-Max Can Work for More Than 24 Hours

OpenAI has released GPT-5.1-Codex-Max, a new agentic coding model designed for long-running software development tasks, and made it available across all Codex surfaces.

The model is built on an updated reasoning foundation and is trained on agentic tasks across software engineering, math, research, and more, according to OpenAI. It is the company’s first system trained to operate across multiple context windows through a process called compaction, enabling it to maintain coherence over millions of tokens during a single task.

OpenAI said the model can independently run for hours, adding that internal tests saw Codex-Max “persistently iterate on its implementation, fix test failures, and ultimately deliver a successful result” over tasks that ran for more than 24 hours.

The new model is accessible to users on ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. Developers using Codex CLI via API key will receive access when API support rolls out. GPT-5.1-Codex-Max will now replace GPT-5.1-Codex as the default in all Codex interfaces.

OpenAI said 95% of its internal engineering team uses Codex weekly and that engineers “ship roughly 70% more pull requests since adopting Codex.”

Higher Accuracy and Better Token Efficiency

GPT-5.1-Codex-Max outperforms previous versions on several real-world and benchmark coding evaluations. On SWE-Lancer, it reached 79.9% accuracy, compared with 66.3% for GPT-5.1-Codex. On SWE-bench Verified, Codex-Max achieved higher accuracy at the same reasoning level while using 30% fewer thinking tokens.

OpenAI said the efficiency gains translate to lower costs for developers. In one example, the model generated a full browser-based CartPole reinforcement learning sandbox requiring 27,000 thinking tokens, compared to 37,000 for the earlier Codex model.

The company is also introducing a new extra-high reasoning mode for non-latency-sensitive tasks, which allows the model to think longer before producing output.

Long-Horizon Work and Windows Support

Because of compaction, GPT-5.1-Codex-Max can handle complex refactors, multi-hour debugging, and extended agent loops that previously failed due to context limits. It is also the first Codex model trained to operate inside Windows environments. The system now includes tasks specifically designed to improve collaboration inside the Codex CLI.

Safeguards and Cybersecurity

OpenAI said GPT-5.1-Codex-Max “does not reach High capability on Cybersecurity” under its Preparedness Framework, but is the most capable cybersecurity model the company has deployed so far.

OpenAI said it is preparing additional safeguards as agentic capabilities evolve, noting that it has already disrupted attempts to misuse its models in cyber operations.

Codex runs in a restricted sandbox by default, with limited file access and no network connectivity unless explicitly enabled. OpenAI recommends keeping these limits in place to avoid prompt-injection risks.

“Codex should be treated as an additional reviewer and not a replacement for human reviews,” the company said, adding that developers should examine all generated changes before deployment.

ALSO READ: EU Data Act Goes Live—Why Today Marks a Turning Point for Enterprise Strategy

Join Our Core Community

De-Risking the Crypto Portfolio: How AI Offers CFOs Control in a 24/7 Market

6 Enterprise Tests to Expose Hidden AI Compliance Risks Across Borders

Forward-Looking Technical Debt: The Hidden Cost of AI Hesitation

Why AI Governance Is Becoming a Board-Level Issue for Multinationals

Intent: The Missing Data Layer in Generative AI

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

“Synthetic Data Is Not the Ground Truth” — SandboxAQ’s VP of Engineering on Simulation’s Power and Limits

Data as the New Diagnostic: How Ahead Health is Turning Algorithms Into Preventive Care

Why Data Leaders Are Wary of a Synthetic Future

Red Hat & NVIDIA Launch AI Factory For Enterprise-Scale Deployment

Perplexity AI Unveils ‘Perplexity Computer’ to Orchestrate Multiple AI Models

Canva Boosts Creative, Ad Tech with New Acquisitions

Microsoft Expands Sovereign Cloud Portfolio

AMD, Meta Sign Multi-Year Deal to Deploy Up to 6 Gigawatts of AI GPUs

OpenAI’s GPT-5.1-Codex-Max Can Work for More Than 24 Hours

GPT-5.1-Codex-Max outperforms previous versions on several real-world and benchmark coding evaluations.

Higher Accuracy and Better Token Efficiency

Long-Horizon Work and Windows Support

Safeguards and Cybersecurity

Table of Contents [hide]

Red Hat & NVIDIA Launch AI Factory For Enterprise-Scale Deployment

Perplexity AI Unveils ‘Perplexity Computer’ to Orchestrate Multiple AI Models

Unpack More

OpenAI Partners with BCG, McKinsey, Accenture & Capgemini to Drive Enterprise AI Adoption

OpenClaw Creator Peter Steinberger Joins OpenAI

Spotify Reveals Senior Developers Don’t Write Code Anymore

SoftBank Doubles Down Investment in OpenAI

Why Data Leaders Are Wary of a Synthetic Future

What Everyone Got Wrong About AI in 2025

AI & Data Insider’s Contributors’ Circle: Meet 2025’s Leading Voices