[openai-blog] New and improved content moderation tooling

SEV-3OpenAI

[openai-blog] New and improved content moderation tooling

2026-05-10 2 sources standard

OpenAI announced on 10 August 2022 the release of updated content moderation tooling, including a new Moderation endpoint and improvements to its existing systems [source]. The company stated the endpoint is designed to help developers identify potentially harmful content across several categories including hate, self-harm, sexual content, and violence.

The announcement described the new endpoint as "free to use" for monitoring OpenAI API inputs and outputs, with the company encouraging developers to integrate it into their applications. OpenAI reported the updated model shows improved performance over its predecessor, though specific accuracy metrics were not disclosed in the initial announcement.

The tooling update followed OpenAI's earlier moderation systems and represented an iteration on the company's approach to content filtering. The endpoint returns classification results across multiple harm categories, allowing developers to set their own thresholds for content filtering decisions.

OpenAI stated the moderation models were trained on data that may not generalise to all use cases, and noted that accuracy may vary depending on context. The company indicated the endpoint would continue to evolve based on user feedback and changing safety requirements.

The release came during a period of increased scrutiny on AI providers regarding content safety mechanisms. OpenAI positioned the tooling as part of its usage policy enforcement, though implementation remained at developer discretion for non-API applications.

The announcement did not specify whether the moderation endpoint would be mandatory for API users or detail how OpenAI would monitor compliance with its usage policies. The company maintained that developers remain responsible for content generated through their applications.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10

Providers OpenAI