← Latest · Archive

SEV-3OpenAI
2 sources standard

OpenAI published an updated Model Spec on 12 February 2025, detailing how its models should behave when instructions conflict or edge cases arise [source]. The document outlines a hierarchy: developer messages override user messages, which override model defaults. This formalisation follows multiple reported incidents where ChatGPT and API models produced unexpected outputs when system prompts clashed with user requests.

The spec addresses refusals, stating models should decline requests that violate usage policies but should "assume best intentions" and avoid over-refusal. OpenAI acknowledges that models sometimes refuse benign requests—a behaviour users have documented across creative writing, medical hypotheticals, and academic research scenarios.

The update introduces explicit guidance on "competing objectives." When a user asks the model to ignore safety instructions, the spec directs the model to follow developer-set rules. When no developer message exists, the model defaults to OpenAI's own guidelines. This hierarchy has implications for enterprise deployments where custom system prompts may conflict with base model training.

OpenAI states the spec is used to generate training data and evaluate model outputs, but notes "models don't always follow the spec perfectly." The company solicits public feedback, indicating the framework remains under revision.

The publication does not disclose which models currently implement the spec in full, nor does it specify version numbers or rollout timelines. Previous spec iterations were referenced in research papers but not published as standalone policy documents. The move toward transparency follows broader industry scrutiny of undocumented model behaviour changes and inconsistent refusal patterns reported by developers and researchers throughout 2024.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10
Providers OpenAI