[openai-blog] Introducing next-generation audio models in the API

SEV-3OpenAI

[openai-blog] Introducing next-generation audio models in the API

2026-05-10 2 sources standard

OpenAI has released next-generation audio models to its API, marking a significant expansion of its speech capabilities beyond the existing text-to-speech and Whisper transcription services [source].

The new models include an updated text-to-speech system and an enhanced speech-to-text model. The text-to-speech offering features improved voice quality and expanded language support, while the speech-to-text model delivers faster processing speeds and higher accuracy across multiple languages.

OpenAI states the audio models are now available through its API for developers building voice-enabled applications. The company has positioned these releases as part of its ongoing effort to provide multimodal AI capabilities alongside its text-based models.

The announcement arrives as audio AI becomes increasingly central to consumer and enterprise applications. Competitors including Google, Anthropic, and Microsoft have also invested in speech synthesis and recognition capabilities, though implementation approaches vary across providers.

OpenAI has not disclosed specific technical details about the underlying architectures or training data used for the new models. The company indicated that pricing information and usage limits are available through its API documentation.

Developers using OpenAI's existing audio endpoints may need to review integration requirements, as the new models introduce different API parameters and response formats. OpenAI has not specified whether legacy audio models will remain available or face deprecation.

The release follows OpenAI's pattern of incremental capability expansion across its model portfolio. No independent benchmarks comparing the new audio models to previous versions or competitor offerings were available at the time of announcement.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10

Providers OpenAI