[openai-blog] Building smarter maps with GPT-4o vision fine-tuning
OpenAI published a case study on 20 November 2024 describing how Grab, the Southeast Asian ride-hailing and delivery platform, fine-tuned GPT-4o's vision capabilities to improve map data extraction [source].
According to the post, Grab processes millions of points of interest daily across eight countries. The company reported that fine-tuning GPT-4o vision reduced hallucinations in address extraction tasks by 30% compared to the base model, while improving accuracy on key fields by 20%.
Grab's engineering team described challenges with the base GPT-4o model producing inconsistent outputs when parsing storefront images and street-level photography. Fine-tuning on domain-specific datasets—images of shopfronts, signage, and local address formats—reportedly stabilised performance across languages including Thai, Vietnamese, and Bahasa Indonesia.
The case study notes that Grab now uses the fine-tuned model to extract business names, operating hours, and address components from user-submitted photos. OpenAI stated the fine-tuning process involved "thousands" of labelled image-text pairs and took several weeks to complete.
OpenAI framed the collaboration as evidence that vision fine-tuning can address reliability issues in production systems handling diverse visual inputs. The post did not disclose baseline error rates, the scale of Grab's training dataset, or whether the 30% hallucination reduction applied uniformly across all languages and image types.
This marks one of the first public disclosures of a major platform operator fine-tuning GPT-4o vision specifically to mitigate hallucination behaviour observed in the base model. The case study suggests OpenAI is positioning fine-tuning as a necessary step for enterprises requiring consistent performance on structured data extraction tasks.
Why this is an AI incident
Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.
Counterfactual "but-for" test per the Editor's Guide.