Multi-model behavioural drift detected on the established refusal-prob

SEV-1OpenAI · Anthropic · Google

Multi-model behavioural drift detected on the established refusal-probe set; three frontier models now answer prompts they previously refused.

2026-05-10 5 sources flash

Multi-model behavioural drift on refusal probe set

[DRAFT — flash bulletin to finish] The internal refusal-probe set, stable for six weeks, now produces answers from GPT-4o, Claude 3.5 Sonnet, and Gemini 2.5 Pro on prompts previously refused by all three. Mistral Large pattern unchanged. No coordinated-deployment announcement from any provider.

Reproducibility: every prompt re-tested twice with identical parameters; new behaviour stable. Right-of-reply has been sent to all four named providers under the 4-hour Sev-1 fast-track.

Why this is an AI incident

The probe set has been stable for six weeks. The simultaneous shift in three models' responses is not explained by random variation and originates in either coordinated deployment changes or a shared training-data update — without the AI providers' deployments there would be no shift.

Counterfactual "but-for" test per the Editor's Guide.

Codes M7, F10

Providers OpenAI, Anthropic, Google, Mistral