OpenAI unveils three audio models for real-time voice tasks


FILE PHOTO: OpenAI logo is seen in this illustration taken May 20, 2024. REUTERS/Dado Ruvic/Illustration/File Photo/File Photo/File Photo

May 7 (Reuters) - OpenAI ⁠introduced three audio models for its ⁠developer platform on Thursday, aiming to make ‌voice-based software agents more conversational and capable of completing tasks in real time.

The launch of the application ​programming interface (API) moves the ChatGPT-maker ⁠beyond transcription and chat ⁠toward agents that can listen, translate and act ⁠during ‌live conversations.

The new models are GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper. OpenAI said ⁠they are available to test in its ​developer playground.

GPT-Realtime-2 ‌is designed to manage harder requests, ⁠call tools, ​handle interruptions and maintain context across longer voice sessions.

The second model supports translation from more ⁠than 70 languages into 13 ​output languages, targeting customer support, education and other settings.

GPT-Realtime-Whisper provides live speech-to-text, allowing captions, meeting notes ⁠and workflow updates to be generated as a speaker talks.

Customers testing the models include online real estate marketplace Zillow, online travel ​agency Priceline and European telecommunications ⁠firm Deutsche Telekom.

Pricing for GPT-Realtime-2 starts at $32 ​per million audio input tokens, ‌GPT-Realtime-Translate costs $0.034 per minute ​and GPT-Realtime-Whisper $0.017 per minute.

(Reporting by Anhata Rooprai in Bengaluru; Editing by Vijay Kishore)

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Others Also Read