[openai-blog] Deep double descent
OpenAI published research on 5 December 2019 describing a phenomenon called "deep double descent," in which model performance follows a counterintuitive pattern as training progresses [source]. The findings challenge conventional assumptions about overfitting and model capacity in neural networks.
The research team observed that test error does not follow a simple U-shaped curve as models grow larger. Instead, performance degrades at a critical threshold where model capacity barely exceeds the number of training samples, then improves again as models become substantially larger. This creates two descent phases separated by a performance valley.
The phenomenon appears across multiple dimensions: model size, training time, and dataset size. In experiments with ResNet architectures on CIFAR-10 and CIFAR-100, researchers documented sharp spikes in test error at specific model widths, followed by steady improvement as width increased further. Similar patterns emerged when varying training epochs, with performance initially worsening before recovering.
The paper notes that models at the interpolation threshold—where they can barely fit the training data—exhibit the worst generalisation. Beyond this point, additional capacity enables models to fit training data in ways that generalise better to test data.
OpenAI's findings suggest that practitioners may encounter unexpected performance degradation when scaling models through specific capacity ranges. The research does not propose mitigation strategies but documents the behaviour across standard architectures and datasets. The work was conducted by Priya Goyal, Mikhail Belkin, and colleagues, building on earlier statistical learning theory about double descent in simpler models.
Why this is an AI incident
Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.
Counterfactual "but-for" test per the Editor's Guide.