[openai-blog] From hard refusals to safe-completions: toward output-ce

SEV-3OpenAI

[openai-blog] From hard refusals to safe-completions: toward output-centric safety training

2026-05-10 2 sources standard

OpenAI announced on 7 August 2025 that it has shifted its safety training approach from "hard refusals" to what it calls "safe-completions" for GPT-5 [source]. The company describes this as a move toward "output-centric safety training" rather than blocking requests outright.

Under the previous approach, models would refuse certain prompts with messages such as "I can't help with that." The new method attempts to provide a response that addresses the user's underlying need while steering away from harmful content. OpenAI states this reduces user frustration and improves task completion rates.

The blog post does not specify which categories of prompts now receive safe-completions instead of refusals, nor does it provide examples of the new behaviour. OpenAI says the training relies on reinforcement learning from human feedback and constitutional AI techniques to shape outputs that are "helpful and harmless."

Independent observers have not yet published systematic evaluations of GPT-5's refusal boundaries under this policy. The change raises questions about consistency: what one user considers a safe completion, another may view as an inadequate guardrail. OpenAI has not disclosed whether the safe-completion approach applies uniformly across all API tiers or whether enterprise customers can configure refusal behaviour.

The announcement follows broader industry debate over refusal rates and model alignment. Some researchers argue that hard refusals are necessary for high-risk categories, while others contend they lead to jailbreaking attempts and poor user experience. OpenAI's shift represents a significant policy change in how frontier models handle sensitive prompts, though the practical impact on model behaviour remains to be documented by third parties.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10

Providers OpenAI