[openai-blog] OpenAI safety practices

SEV-3OpenAI

[openai-blog] OpenAI safety practices

2026-05-10 2 sources standard

OpenAI published a safety update on 21 May 2024 outlining its approach to model evaluation and deployment [source]. The post describes internal processes for red-teaming, external expert review, and staged rollouts designed to identify risks before public release.

The company states it conducts adversarial testing across categories including misinformation, cybersecurity, and persuasion. External researchers are granted early access to flag issues not caught internally. OpenAI says it uses a tiered deployment framework, releasing models first to limited user groups before broader availability.

The update references the company's Preparedness Framework, which assigns risk scores to models based on capability thresholds in areas such as chemical, biological, radiological, and nuclear threats, as well as autonomous replication. Models exceeding "medium" risk in any category require additional safeguards or cannot be deployed.

OpenAI notes it shares safety findings with other AI providers and policymakers, though the post does not specify which findings have been shared or with whom. The company also describes ongoing work to improve model refusal behaviour and reduce false positives where models decline benign requests.

No specific incidents or failures are detailed in the post. The update appears intended to communicate existing safety protocols rather than respond to a particular event. OpenAI states it continues to refine these processes as models grow more capable.

The post does not include quantitative data on red-team success rates, refusal accuracy, or the frequency of safety interventions. It also does not address how safety measures apply to models deployed via API versus ChatGPT, or how third-party fine-tuning affects safety guarantees.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10

Providers OpenAI