[openai-blog] Learning to play Minecraft with Video PreTraining

SEV-3OpenAI

[openai-blog] Learning to play Minecraft with Video PreTraining

2026-05-10 2 sources standard

OpenAI published research on 23 June 2022 describing a model trained to play Minecraft through video pretraining, marking an early application of behavioural cloning from unlabelled video data [source]. The system, called Video PreTraining (VPT), learned to perform complex in-game tasks by observing 70,000 hours of publicly available Minecraft gameplay footage.

The model demonstrated capability to craft diamond tools, a task requiring approximately 24,000 consecutive correct actions in the game environment. OpenAI reported that VPT could learn from video without requiring keyboard and mouse input labels for most training data, using a smaller labelled dataset to train an inverse dynamics model first.

The research represents a shift from text-based model development toward vision-based behavioural learning. OpenAI noted the approach could extend to other domains where large video datasets exist but action labels are scarce or expensive to obtain.

The team released the model weights and training code, alongside a dataset of labelled contractor gameplay. The contractor data comprised 2,000 hours of footage with associated keyboard and mouse inputs recorded at each frame.

VPT's architecture built on the inverse dynamics model to infer actions from video frame transitions, then used those inferred labels to train the behavioural policy on the larger unlabelled corpus. Fine-tuning on specific tasks improved performance further, with the diamond toolcrafting agent trained from a model first adapted to obtain iron tools.

The work preceded OpenAI's later focus on large language models and represented exploration of video as a training modality for sequential decision-making systems. No commercial deployment of VPT for Minecraft or other games was announced.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10

Providers OpenAI