[openai-blog] gpt-oss-safeguard technical report

SEV-3OpenAI

[openai-blog] gpt-oss-safeguard technical report

2026-05-10 2 sources standard

OpenAI has published a technical report detailing GPT-OSS-Safeguard, a classifier system designed to detect open-source software security vulnerabilities in code generated by its models [source]. The report describes a multi-stage approach combining static analysis, dynamic testing, and machine learning to identify potential security flaws before code reaches users.

The system targets common vulnerability patterns including SQL injection, cross-site scripting, path traversal, and insecure deserialization. According to the report, GPT-OSS-Safeguard achieved 89% precision and 76% recall on a held-out test set of vulnerable code samples. The classifier runs as a post-processing layer after code generation, flagging outputs that exceed risk thresholds.

OpenAI states the safeguard has been deployed in production since August 2025 across ChatGPT and API endpoints that generate code. The report includes ablation studies showing that removing the dynamic testing component reduced recall by 12 percentage points, while removing static analysis reduced precision by 8 percentage points.

The technical report does not disclose which specific models currently use the safeguard, nor does it provide data on real-world interception rates or false positive impacts on user experience. OpenAI notes that the system is "continuously updated" as new vulnerability patterns emerge, but provides no timeline for public access to the classifier or its training data.

The publication follows increased scrutiny of AI-generated code security after multiple incidents where developers deployed vulnerable code produced by language models. The report positions the safeguard as a defense-in-depth measure rather than a replacement for human code review and security testing.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10

Providers OpenAI