OpenAI Introduces Three Real-time Audio Models for Voice Assistants

OpenAI has launched a new set of voice intelligence features in its API, introducing three real-time audio models designed to make AI voice conversations more natural, responsive and capable of handling complex tasks as people speak.

The company announced the launch of GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper, aimed at developers building voice assistants, customer support systems, translation tools and live transcription services.

“Voice is becoming one of the most natural ways people interact with software, especially while multitasking, travelling or seeking support in different languages. The new models are designed to help AI systems not only respond faster but also understand context, reason through requests, and continue conversations more naturally,” OpenAI stated.

GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-class reasoning abilities. The model can handle interruptions, recover from mistakes, call multiple tools at once and maintain longer conversations using a larger 128K context window. Developers can also adjust the reasoning level based on the complexity of a task.

The company said the model supports more human-like interactions by allowing AI agents to use phrases such as “let me check that” while processing requests. They added that the model performs significantly better than earlier versions in live audio intelligence and instruction-following benchmarks.

OpenAI also introduced GPT-Realtime-Translate, a live translation model that supports over 70 input languages and 13 output languages. The model translates conversations in real time while keeping pace with the speaker. The tool could be useful for customer support, education, travel, media and global business communication.

Prateek Sachan, Co-founder and CTO of BolnaAI, said, “The translation model performed well across Indian languages. In our evaluations across Hindi, Tamil, and Telugu, GPT-Realtime-Translate delivered lower word.”

The third model, GPT-Realtime-Whisper, is a low-latency speech-to-text system that transcribes conversations in real time as users speak. The model can be used for captions, meeting notes, customer support workflows and healthcare or recruitment applications.

Josh Weisberg, Senior Vice President and Head of AI at Zillow, said, “The new model improved the company’s voice AI performance significantly. What stood out about GPT-Realtime-2 was the intelligence and tool-calling reliability it brings to complex voice interactions. On our hardest adversarial benchmark, this translates to a 26-point lift in call success rate after prompt optimisation.”

The new voice models are now available through its Realtime API with built-in safeguards against misuse and support for enterprise privacy and EU data-residency requirements.

ALSO READ: Google Adds Reddit & Forum Advice Into AI Search Results

Join Our Core Community

When Money Moves Itself: Why Agent‑Readable Banks Still Need Human Guardians

Senior AI Talent is Choosing Stability—and Often that Means Europe

Traceability Shifts Trust from Theoretical to Demonstrated

The AI Infra Deals That Defined Q1 2026

Inside the Orchestration Crisis: Why AI‑Driven Enterprises Need a Control Plane, Not More Tools

Why Data Reliability Now Governs Scaling GenAI

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

“Synthetic Data Is Not the Ground Truth” — SandboxAQ’s VP of Engineering on Simulation’s Power and Limits

Data as the New Diagnostic: How Ahead Health is Turning Algorithms Into Preventive Care

Ramp Builds AI Model Better than Claude Opus for Navigating Spreadsheets