penAI has released a research preview of GPT-5.3-Codex-Spark, a smaller version of GPT-5.3-Codex built for real-time coding inside Codex.
The model is rolling out to ChatGPT Pro users and is the first deployment under OpenAI’s partnership with Cerebras, announced in January.
Besides that, Codex-Spark is available in the latest versions of the Codex app, CLI, and VS Code extension for ChatGPT Pro users. A limited set of API design partners will also gain access as OpenAI studies integration patterns under production workloads. Broader access is planned in the coming weeks.
It is optimised for low-latency inference and delivers more than 1,000 tokens per second on Cerebras hardware.
OpenAI said the model is built for interactive coding tasks such as making targeted edits, adjusting logic, and refining interfaces with immediate feedback.
“We’re sharing Codex-Spark on Cerebras as a research preview to ChatGPT Pro users so that developers can start experimenting early while we work with Cerebras to ramp up datacenter capacity, harden the end-to-end user experience, and deploy our larger frontier models,” the company said.
The release introduces a dual-mode direction for Codex, combining long-running autonomous work and real-time collaboration. OpenAI said its frontier models can handle extended tasks over hours or days, while Codex-Spark focuses on live iteration.
Codex-Spark supports a 1,28,000-token context window and text-only input. During the research preview, usage will have separate rate limits and will not count toward standard limits, though access may be queued during periods of high demand.
On SWE-Bench Pro and Terminal-Bench 2.0, benchmarks measuring agentic software engineering capability, GPT-5.3-Codex-Spark showed lower task completion times than GPT-5.3-Codex while maintaining competitive accuracy.
On Terminal-Bench 2.0, GPT-5.3-Codex-Spark recorded 58.4% accuracy, compared with 77.3% for GPT-5.3-Codex and 46.1% for GPT-5.1-Codex-mini.
OpenAI said it also reduced latency across its request-response pipeline. The company introduced a persistent WebSocket connection and modified its Responses API and inference stack.
Codex-Spark runs on the Wafer Scale Engine 3 developed by Cerebras. OpenAI said the hardware provides a low-latency serving tier integrated with its existing production stack, which continues to rely on GPUs for training and general inference.
OpenAI said GPUs remain central to its infrastructure and that Cerebras’ hardware complements them in workflows requiring low latency. The company added that both systems can be combined within a single workload.
OpenAI said future updates will expand the Codex-Spark family with larger models, longer context windows, and multimodal input. Over time, the company plans to blend real-time collaboration with background task delegation so users can shift between interactive coding and longer-running processes within the same workflow.
ALSO READ: OpenAI Begins Testing Ads on Free and Go Plans in the US