[openai-blog] New AI classifier for indicating AI-written text
OpenAI released an AI text classifier on 31 January 2023, designed to distinguish between human-written and AI-generated content [source]. The tool was positioned as a response to concerns about AI-generated misinformation, academic dishonesty, and automated spam.
The classifier returned one of five confidence levels: "very unlikely", "unlikely", "unclear if it is", "possibly", or "likely" AI-generated. OpenAI acknowledged significant limitations at launch. The tool correctly identified only 26% of AI-written text as "likely AI-generated", while incorrectly labelling human-written text as AI-generated 9% of the time [source].
Performance degraded further on short texts. The classifier required a minimum of 1,000 characters—roughly 150–250 words—and was unreliable below that threshold [source]. It also performed poorly on text written by children or non-native English speakers, and could be easily evaded by editing AI output [source].
OpenAI stated the classifier was trained on pairs of human-written text and AI-generated text on the same topic, sourced from 34 text-generation models [source]. The company noted it would be "significantly less reliable" on text produced by newer AI systems not included in the training data [source].
The tool was offered free of charge and did not require an OpenAI account. OpenAI described it as part of ongoing research into AI-generated content detection, while cautioning that it "should not be used as a primary decision-making tool" [source]. The company invited feedback to improve the classifier's accuracy and utility.
The classifier was later withdrawn in July 2023 due to its low accuracy rate.
Why this is an AI incident
Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.
Counterfactual "but-for" test per the Editor's Guide.