← Latest · Archive

SEV-3OpenAI
2 sources standard

OpenAI announced on 1 October 2024 that vision capabilities are now available in its fine-tuning API, allowing developers to customise GPT-4o models with image-text pairs [source]. The feature enables fine-tuning on tasks such as visual question answering, object detection, and image captioning.

According to the announcement, developers can upload datasets containing images alongside text prompts and completions. OpenAI states the fine-tuned models can improve performance on domain-specific visual tasks, such as interpreting medical imagery, analysing satellite photos, or processing documents with complex layouts.

The company reports that early testers observed accuracy improvements on specialised visual tasks after fine-tuning with as few as several dozen examples. OpenAI notes that vision fine-tuning follows the same pricing structure as text-only fine-tuning, with costs based on the number of tokens processed during training.

The feature is available to developers in the fine-tuning API dashboard. OpenAI states that fine-tuned vision models support the same image formats as the base GPT-4o model, including PNG, JPEG, and WebP files up to 20MB.

This marks the first time OpenAI has enabled fine-tuning for multimodal inputs beyond text. The company previously restricted vision capabilities to pre-trained models without customisation options. Developers using the API will need to ensure training datasets comply with OpenAI's usage policies, which prohibit certain categories of image content.

OpenAI has not disclosed technical details about how vision fine-tuning affects model behaviour compared to text-only fine-tuning, nor whether the same drift and stability considerations apply to vision-enhanced models.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10
Providers OpenAI