← Latest · Archive

SEV-3OpenAI
2 sources standard

OpenAI disclosed on 27 June 2024 that GPT-4 produces errors detectable by GPT-4 itself, describing a method called CriticGPT designed to catch mistakes in model-generated code [source].

The company trained CriticGPT on code containing intentional bugs, teaching it to identify errors in GPT-4's outputs. In human evaluations, CriticGPT outperformed unassisted human reviewers at spotting problems in ChatGPT responses. When human reviewers used CriticGPT assistance, they produced more comprehensive critiques than either humans or the model working alone.

OpenAI reported that CriticGPT caught bugs in 85% of naturally occurring GPT-4 coding errors, compared to 25% for human reviewers working without assistance. The system was trained using reinforcement learning from human feedback (RLHF), the same technique used to align GPT-4 itself.

The disclosure indicates that GPT-4 generates code containing errors at a rate requiring systematic review. OpenAI stated the approach helps address a scaling challenge: as models improve, their mistakes become harder for humans to identify without assistance.

CriticGPT sometimes hallucinates problems that do not exist, a behaviour OpenAI acknowledged as a limitation. The company said it is working to reduce these false positives. The model was trained only on short ChatGPT responses and does not yet handle longer or more complex tasks effectively.

OpenAI described CriticGPT as part of research into aligning superhuman AI systems, suggesting current models already produce outputs that exceed unassisted human ability to evaluate. The company has not announced plans to deploy CriticGPT in production or make it available to users.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10
Providers OpenAI