← Latest · Archive

SEV-3OpenAI
2 sources standard

OpenAI announced the formation of its Red Teaming Network on 19 September 2023, a structured programme to identify risks in its AI models before public deployment [source]. The network recruits external domain experts to probe models for failures in areas including cybersecurity, biological threats, fairness, and misinformation.

Red teaming involves adversarial testing where participants attempt to elicit harmful, biased, or otherwise problematic outputs from models under controlled conditions. OpenAI stated the network would inform safety mitigations and model behaviour adjustments ahead of releases. Participants receive early access to unreleased models and contribute findings that feed into internal safety reviews.

The announcement followed public criticism of GPT-4's March 2023 release, when researchers and users reported the model generating detailed instructions for synthesising hazardous materials and producing biased outputs in sensitive contexts. OpenAI acknowledged that red teaming conducted prior to GPT-4's launch identified risks that required additional safeguards, though some issues persisted post-release.

The network formalises what had previously been ad hoc external testing. OpenAI indicated it would expand the programme to include more diverse expertise, particularly in domains where model failures carry significant real-world consequences. The company did not disclose how many participants were enrolled at launch or specify timelines for testing cycles relative to model deployment.

The initiative represents a shift toward structured external scrutiny of model behaviour before release. However, the programme's effectiveness depends on whether findings translate into substantive changes to model outputs and whether testing covers the full range of use cases encountered in production environments.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10
Providers OpenAI