[openai-blog] Hello GPT-4o
OpenAI announced GPT-4o on 13 May 2024, describing it as a new flagship model that accepts text, audio, and image inputs and produces text, audio, and image outputs [source]. The company stated that GPT-4o matches GPT-4 Turbo performance on text, reasoning, and coding intelligence while offering "much faster" response times and improved capabilities in vision and audio understanding.
According to the announcement, GPT-4o responds to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, approaching human response time in conversation. The model scores 87.2% on MMLU (5-shot, CoT), placing it at the high end of existing benchmarks for general knowledge tasks. OpenAI reported that GPT-4o is "especially better at vision and audio understanding compared to existing models."
The model is being rolled out iteratively. Text and image capabilities launched in the API and ChatGPT on the announcement date, with GPT-4o available to Plus users with a message limit and free users with lower limits. OpenAI stated that audio and video capabilities would be released to "a small group of trusted partners" in the API in the coming weeks.
OpenAI described GPT-4o as natively multimodal, trained end-to-end across text, vision, and audio. The company noted that previous models used separate components for different modalities, which could lose information. GPT-4o processes all inputs and outputs through a single neural network.
The announcement included benchmark results showing GPT-4o outperforming existing models on multilingual, audio, and vision tasks. OpenAI stated the model is "our first model combining all of these modalities" and represents a step toward "much more natural human-computer interaction."
Why this is an AI incident
Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.
Counterfactual "but-for" test per the Editor's Guide.