OpenAI has released GPT-5.4 mini and GPT-5.4 nano, two smaller models designed for high-throughput, latency-sensitive applications such as coding assistants and multi-agent systems.
The models extend the GPT-5.4 family into lower-cost tiers, with the company positioning them as optimised for workloads where responsiveness and cost efficiency matter more than raw model size.
GPT-5.4 mini significantly outperforms GPT-5 mini across coding, reasoning, multimodal understanding, and tool use, while running more than twice as fast. It also approaches the performance of the larger GPT-5.4 model on several benchmarks, including SWE-Bench Pro and OSWorld-Verified.
GPT-5.4 nano, the smallest variant, is targeted at simpler tasks such as classification, ranking and data extraction, as well as lightweight coding subagents.
“These models are built for the kinds of workloads where latency directly shapes the product experience,” OpenAI said in its official blog post. The company highlighted use cases such as real-time coding assistants, multimodal applications and systems that interpret user interfaces.
In benchmarks, GPT-5.4 mini delivers a strong performance-to-latency profile. On SWE-Bench Pro, it achieves 54.4% accuracy, compared to 57.7% for the full GPT-5.4 model, while significantly outperforming GPT-5 mini at 45.7%.
The model also shows gains across tool use and reasoning benchmarks, including Toolathlon and GPQA Diamond, where it approaches the performance of larger models while maintaining lower latency and cost.
OpenAI said GPT-5.4 mini “consistently outperforms GPT-5-mini at similar latencies and approaches GPT-5.4-level pass rates while running much faster,” positioning it as a strong candidate for production systems that require rapid iteration.
GPT-5.4 nano trades off further performance for cost efficiency, with lower benchmark scores but significantly reduced pricing, making it suitable for high-volume, low-complexity tasks.
OpenAI is also aligning the models with emerging multi-agent system design patterns. GPT-5.4 mini is positioned as a “subagent” model that can handle narrower tasks such as codebase search, file analysis or document processing, under the coordination of a larger model like GPT-5.4.
The company said this approach allows developers to split workloads across models, with larger systems handling planning and smaller models executing tasks in parallel at lower cost.
“This pattern becomes more useful as smaller models get faster and more capable,” OpenAI said, describing a shift toward composable AI systems rather than single-model deployments.
GPT-5.4 mini is available across OpenAI’s API, Codex and ChatGPT, supporting text and image inputs, tool use and long-context reasoning with a 400,000-token context window. It is priced at $0.75 per million input tokens and $4.50 per million output tokens.
GPT-5.4 nano is available via the API at lower pricing tiers, targeting cost-sensitive deployments.
ALSO READ: NVIDIA’s 7-Chip Vera Rubin Platform in Full Production