[openai-blog] DALL·E 2 pre-training mitigations

SEV-3OpenAI

[openai-blog] DALL·E 2 pre-training mitigations

2026-05-10 2 sources standard

OpenAI disclosed on 28 June 2022 that DALL·E 2 underwent pre-training mitigations to address bias and harmful content risks before public deployment [source]. The company filtered the training dataset to reduce violent, sexual, and hateful imagery, then re-trained the model on the cleaned data.

The mitigations targeted three categories: violent content including graphic violence and sexual imagery, hate symbols, and images of identifiable individuals without consent. OpenAI reported removing approximately 6% of the training data based on automated classifiers and human review. The company stated the filtering reduced the model's tendency to generate violent outputs when prompted with neutral terms.

Post-mitigation testing showed the model generated violent imagery 30% less frequently in response to benign prompts compared to the unfiltered baseline. Bias measurements indicated the filtered model reduced gender and racial stereotyping in occupational prompts, though OpenAI acknowledged residual bias remained. The company noted the mitigations did not eliminate all problematic outputs and that additional safety systems would operate at inference time.

The disclosure followed earlier limited beta access granted in April 2022. OpenAI stated the pre-training approach complemented prompt-based filters and content policy enforcement applied when users generate images. The company published the methodology as part of a staged rollout strategy, expanding access to 1 million users by July 2022.

The announcement provided technical detail on dataset curation but did not specify error rates for the automated classifiers or disclose what proportion of harmful content the filters missed. OpenAI indicated ongoing monitoring would inform further model updates.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10

Providers OpenAI