Tech

OpenAI Unveils Advanced Voice Intelligence Models in Realtime API Update

The San Francisco-based firm introduces tools designed to move voice interfaces beyond simple call-and-response, targeting sectors from customer service to creator platforms while implementing strict safety guardrails.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: TechCrunch · original

Artificial Intelligence Media Research

Related coverage

Explore Artificial Intelligence coverage Explore Media coverage Explore Research coverage More from the Tech desk

OpenAI launches new voice intelligence features in its API

New capabilities include GPT-5-class reasoning, multi-language translation, and live transcription for enterprise applications.

OpenAI has announced the integration of new voice intelligence features within its Realtime API, marking a significant expansion of its developer tools. The update introduces three distinct models designed to transform real-time audio interactions from basic exchanges into sophisticated interfaces capable of listening, reasoning, translating, transcribing, and taking action.

The flagship addition is GPT-Realtime-2, a voice model engineered with GPT-5-class reasoning to handle complex user requests and deliver realistic vocal simulations. This iteration represents a marked improvement over its predecessor, GPT-Realtime-1.5, specifically addressing the need to manage intricate conversational flows that require deeper cognitive processing.

Complementing the reasoning capabilities are two new utilities focused on accessibility and documentation. GPT-Realtime-Translate offers real-time language conversion, supporting over 70 input languages and relaying responses in 13 output languages to keep pace with user conversation. Simultaneously, GPT-Realtime-Whisper provides live speech-to-text transcription, capturing interactions as they occur to ensure accurate record-keeping during calls.

Pricing structures for these new tools vary based on usage metrics. OpenAI states that the translation and transcription features will be billed by the minute, whereas GPT-Realtime-2 follows a token consumption model. The company emphasises that these updates are intended for a broad range of applications, extending beyond obvious targets like customer service systems to include education, media, events, and creator platforms.

To mitigate potential risks associated with such powerful voice interfaces, OpenAI has embedded specific safety triggers within the system. These guardrails are designed to automatically halt conversations if they are detected as violating harmful content guidelines, aiming to prevent misuse for spam, fraud, or other forms of online abuse.

This development occurs alongside a broader industry shift regarding embedded artificial intelligence. Recent adjustments by competitors, such as Google allowing Chrome users to disable the Gemini Nano AI model to address privacy and storage concerns, highlight the growing focus on user consent and control in the deployment of advanced AI technologies.

OpenAI Unveils Advanced Voice Intelligence Models in Realtime API Update

More from Tech

Twitch CEO Dan Clancy Announces New Penalties for Viewbotting Offenders

Four protesters sue DHS and FBI over DNA collection following 'Operation Midway Blitz' arrests

Ramp reportedly in advanced talks to raise $750 million at $40 billion valuation