Tech

OpenAI Unveils Advanced Voice Intelligence Models in Realtime API Update

The San Francisco-based firm introduces tools designed to move voice interfaces beyond simple call-and-response, targeting sectors from customer service to creator platforms while implementing strict safety guardrails.

Author
Owen Mercer
Markets and Finance Editor
Published
Draft
Source: TechCrunch · original
OpenAI launches new voice intelligence features in its API
New capabilities include GPT-5-class reasoning, multi-language translation, and live transcription for enterprise applications.

OpenAI has announced the integration of new voice intelligence features within its Realtime API, marking a significant expansion of its developer tools. The update introduces three distinct models designed to transform real-time audio interactions from basic exchanges into sophisticated interfaces capable of listening, reasoning, translating, transcribing, and taking action.

The flagship addition is GPT-Realtime-2, a voice model engineered with GPT-5-class reasoning to handle complex user requests and deliver realistic vocal simulations. This iteration represents a marked improvement over its predecessor, GPT-Realtime-1.5, specifically addressing the need to manage intricate conversational flows that require deeper cognitive processing.

Complementing the reasoning capabilities are two new utilities focused on accessibility and documentation. GPT-Realtime-Translate offers real-time language conversion, supporting over 70 input languages and relaying responses in 13 output languages to keep pace with user conversation. Simultaneously, GPT-Realtime-Whisper provides live speech-to-text transcription, capturing interactions as they occur to ensure accurate record-keeping during calls.

Pricing structures for these new tools vary based on usage metrics. OpenAI states that the translation and transcription features will be billed by the minute, whereas GPT-Realtime-2 follows a token consumption model. The company emphasises that these updates are intended for a broad range of applications, extending beyond obvious targets like customer service systems to include education, media, events, and creator platforms.

To mitigate potential risks associated with such powerful voice interfaces, OpenAI has embedded specific safety triggers within the system. These guardrails are designed to automatically halt conversations if they are detected as violating harmful content guidelines, aiming to prevent misuse for spam, fraud, or other forms of online abuse.

This development occurs alongside a broader industry shift regarding embedded artificial intelligence. Recent adjustments by competitors, such as Google allowing Chrome users to disable the Gemini Nano AI model to address privacy and storage concerns, highlight the growing focus on user consent and control in the deployment of advanced AI technologies.

Continue reading

More from Tech

Read next: Twitch CEO Dan Clancy Announces New Penalties for Viewbotting Offenders
Read next: Four protesters sue DHS and FBI over DNA collection following 'Operation Midway Blitz' arrests
Read next: Ramp reportedly in advanced talks to raise $750 million at $40 billion valuation