[openai-blog] Strengthening our safety ecosystem with external testing

SEV-3OpenAI

[openai-blog] Strengthening our safety ecosystem with external testing

2026-05-10 2 sources standard

OpenAI announced on 19 November 2025 that it is expanding external safety testing through partnerships with the UK AI Safety Institute (UK AISI) and US AI Safety Institute (US AISI), granting both organizations early access to new models before public release [source].

The arrangement allows safety institutes to conduct independent evaluations of upcoming OpenAI models. UK AISI will assess models against its Inspect evaluation framework, while US AISI will test for chemical, biological, radiological, and nuclear (CBRN) risks, as well as cybersecurity threats. OpenAI states it will not deploy models if safety institutes identify critical national security concerns.

This marks a formalization of testing relationships that began informally in 2023. OpenAI reports it has provided early access to GPT-4o, o1-preview, and o1 under these arrangements. The company describes the partnerships as part of its Preparedness Framework, which it says guides internal safety decisions.

The announcement follows broader industry movement toward pre-deployment testing. OpenAI notes it also works with the AI Safety Institutes Consortium and has committed to voluntary safety commitments in multiple jurisdictions.

No specific evaluation results were disclosed. OpenAI did not detail what threshold of risk would trigger a deployment halt, nor whether safety institutes have previously flagged concerns that delayed or modified releases. The company stated the partnerships will continue as it develops future models.

The testing framework applies only to OpenAI's own model releases. Third-party applications built on OpenAI APIs remain outside this evaluation scope.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10

Providers OpenAI