[openai-blog] Introducing the OpenAI Safety Bug Bounty program
OpenAI has launched a Safety Bug Bounty program offering rewards between $200 and $20,000 for researchers who identify safety vulnerabilities in its AI systems [source]. The program targets failures in model behaviour, content filtering, and safety guardrails rather than traditional software security flaws.
Eligible submissions include jailbreaks that bypass safety mitigations, prompt injections that cause unintended behaviour, and methods to extract training data or generate prohibited content. OpenAI states it will also consider reports of "model behaviour that could lead to real-world harm" [source].
The program covers ChatGPT, the API, and DALL-E, but excludes issues already documented in OpenAI's usage policies or model limitations pages. Researchers must demonstrate reproducible exploits and avoid testing on production systems without permission.
Bounty amounts scale with severity and exploitability. Critical vulnerabilities enabling "significant harm at scale" qualify for the maximum $20,000 reward, while lower-severity issues such as minor filter bypasses receive $200 to $2,000 [source]. OpenAI reserves discretion over final classifications and payment amounts.
The program follows similar initiatives from Anthropic and Google, which launched safety bounties in 2024 and 2025 respectively. Security researchers have previously disclosed jailbreaks and filter bypasses through informal channels, sometimes prompting emergency patches.
OpenAI notes that submissions will be reviewed by its safety team and that disclosure timelines will be coordinated with researchers. The company has not published aggregate statistics on vulnerabilities reported through internal channels prior to this program.
Researchers can submit reports through a dedicated portal linked from the announcement page.
Why this is an AI incident
Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.
Counterfactual "but-for" test per the Editor's Guide.