OpenAI has launched a new set of voice intelligence features in its API, introducing three real-time audio models designed to make AI voice conversations more natural, responsive and capable of handling complex tasks as people speak.
The company announced the launch of GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper, aimed at developers building voice assistants, customer support systems, translation tools and live transcription services.
“Voice is becoming one of the most natural ways people interact with software, especially while multitasking, travelling or seeking support in different languages. The new models are designed to help AI systems not only respond faster but also understand context, reason through requests, and continue conversations more naturally,” OpenAI stated.
GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-class reasoning abilities. The model can handle interruptions, recover from mistakes, call multiple tools at once and maintain longer conversations using a larger 128K context window. Developers can also adjust the reasoning level based on the complexity of a task.
The company said the model supports more human-like interactions by allowing AI agents to use phrases such as “let me check that” while processing requests. They added that the model performs significantly better than earlier versions in live audio intelligence and instruction-following benchmarks.
OpenAI also introduced GPT-Realtime-Translate, a live translation model that supports over 70 input languages and 13 output languages. The model translates conversations in real time while keeping pace with the speaker. The tool could be useful for customer support, education, travel, media and global business communication.
Prateek Sachan, Co-founder and CTO of BolnaAI, said, “The translation model performed well across Indian languages. In our evaluations across Hindi, Tamil, and Telugu, GPT-Realtime-Translate delivered lower word.”
The third model, GPT-Realtime-Whisper, is a low-latency speech-to-text system that transcribes conversations in real time as users speak. The model can be used for captions, meeting notes, customer support workflows and healthcare or recruitment applications.
Josh Weisberg, Senior Vice President and Head of AI at Zillow, said, “The new model improved the company’s voice AI performance significantly. What stood out about GPT-Realtime-2 was the intelligence and tool-calling reliability it brings to complex voice interactions. On our hardest adversarial benchmark, this translates to a 26-point lift in call success rate after prompt optimisation.”
The new voice models are now available through its Realtime API with built-in safeguards against misuse and support for enterprise privacy and EU data-residency requirements.
ALSO READ: Google Adds Reddit & Forum Advice Into AI Search Results
