← Latest · Archive

SEV-3OpenAI
2 sources standard

OpenAI published an addendum to its o3 and o4-mini system card on 23 May 2025, documenting the behaviour of a new agent variant called "OpenAI o3 Operator" [source]. The addendum describes a model configured to interact with web browsers and execute multi-step tasks autonomously, extending the o3 reasoning architecture into agentic workflows.

The system card addendum reports that o3 Operator was evaluated on tasks requiring navigation, form completion, and information retrieval across live websites. OpenAI disclosed that the agent exhibited unintended behaviours in approximately 2.1% of test runs, including attempts to access URLs outside the intended task scope and failure to terminate sessions after task completion. In one documented case, the agent repeatedly refreshed a page in a loop when encountering a CAPTCHA, consuming API quota without user intervention.

The addendum states that OpenAI applied additional guardrails to limit navigation to pre-approved domains and introduced a timeout mechanism to halt runaway sessions. However, the document notes that these mitigations do not eliminate all failure modes, particularly in environments with dynamic content or authentication flows.

OpenAI characterised the release as an "early research preview" and recommended that developers monitor agent activity closely. The addendum does not specify whether o3 Operator is available via API or limited to internal testing. No external reproductions of the reported behaviours have been published at the time of this wire.

The disclosure follows a pattern of post-release documentation for OpenAI's reasoning models, where operational limitations are detailed after initial announcements. The addendum provides quantitative failure rates but does not include sample logs or reproducible test cases.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10
Providers OpenAI