Cursor Finds GPT 5.2 Better Than Claude Opus 4.5 for Long Autonomous Tasks

Cursor says it has found OpenAI’s GPT-5.2 models to be significantly more reliable than Anthropic’s Claude Opus 4.5 for long-running, autonomous coding tasks.

On the same day, Cursor also made the GPT 5.2 model available on its platform.

This was found when the team set out to build a web browser from scratch using Cursor. CEO Michael Truell said on X that the browser’s rendering engine was built from scratch in Rust, with support for HTML parsing, CSS cascade and layout, text shaping, painting, and a custom JavaScript virtual machine.

“It kind of works,” Truell wrote. “It still has issues and is, of course, very far from WebKit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.”

Cursor has released the code on GitHub.

In a research blog post published this week, Cursor described the browser as part of a broader effort to test whether autonomous coding agents can scale to projects “that typically take human teams months to complete.”

Cursor stated that while building the browser, “We found that GPT-5.2 models are much better at extended autonomous work: following instructions, keeping focus, avoiding drift, and implementing things precisely and completely.”

By contrast, “Opus 4.5 tends to stop earlier and take shortcuts when convenient, yielding back control quickly,” Cursor said.

Other long-running experiments include a multi-week, in-place migration of Cursor’s own codebase from Solid to React, involving +266,000 and –193,000 lines of changes, a Java Language Server Protocol project with 7,400 commits and 550,000 lines of code, a Windows 7 emulator exceeding 1.2 million lines, and an Excel-like system reaching 1.6 million lines.

In another case, Cursor said a long-running agent rewrote a video-rendering pipeline in Rust, making it “25× faster” while also adding smooth zooming, panning, and motion-blur effects.

ALSO READ: TCS, AMD Partner to Push Enterprise AI Pilots to Production

Join Our Core Community

Alteryx Wants AI to Build Inside Governance, Not Around It

Rethink Governance Not as a Defensive Mechanism, But as a Strategic Lever

Can AI Really Earn a Seat at the Supplier Negotiation Table?

When Money Moves Itself: Why Agent‑Readable Banks Still Need Human Guardians

Senior AI Talent is Choosing Stability—and Often that Means Europe

Data Layer Precedes Compute, GPU Capacity in Sovereign AI

Why Data Reliability Now Governs Scaling GenAI

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

“Synthetic Data Is Not the Ground Truth” — SandboxAQ’s VP of Engineering on Simulation’s Power and Limits

SandboxAQ Integrates LQMs with Anthropic’s Claude for Drug Discovery

Salesforce to Spend Nearly $300 Mn on Anthropic Tokens in 2026

Blackstone, Google Launch TPU Cloud Venture With $5 Bn Commitment

Alteryx Inspire 2026: Three Questions Every Data Leader Should Take to Orlando

OpenAI Brings Codex to ChatGPT Mobile App for Remote Coding Work

Cursor Finds GPT 5.2 Better Than Claude Opus 4.5 for Long Autonomous Tasks

The comparison was found when the team set out to build a web browser from scratch using Cursor.

SandboxAQ Integrates LQMs with Anthropic’s Claude for Drug Discovery

Salesforce to Spend Nearly $300 Mn on Anthropic Tokens in 2026

Unpack More

SandboxAQ Integrates LQMs with Anthropic’s Claude for Drug Discovery

Salesforce to Spend Nearly $300 Mn on Anthropic Tokens in 2026

OpenAI Brings Codex to ChatGPT Mobile App for Remote Coding Work

SAP Expands AI Stack with Anthropic, Palantir in Agentic Enterprise Push

Why Data Reliability Now Governs Scaling GenAI

Middle East: The Sovereign AI Testbed US, EU and Asia Can Learn From

NVIDIA’s VP of Solutions Architecture on What It Actually Takes to Build a Sovereign AI Factory