[openai-blog] OpenAI Baselines: ACKTR & A2C

SEV-3OpenAI

[openai-blog] OpenAI Baselines: ACKTR & A2C

2026-05-10 2 sources standard

OpenAI released two new reinforcement learning baselines on 18 August 2017: ACKTR (Actor Critic using Kronecker-Factored Trust Region) and A2C (Advantage Actor Critic). The release was announced via the company's blog [source].

ACKTR is described as a sample-efficient method that applies trust region optimization to actor-critic methods using Kronecker-factored approximation. A2C is presented as a synchronous, deterministic variant of the Asynchronous Advantage Actor Critic (A3C) algorithm. Both implementations were added to OpenAI Baselines, the company's collection of high-quality reinforcement learning algorithm implementations.

The blog post states that ACKTR achieves higher rewards per timestep than A2C on Atari benchmarks, while A2C is noted for being simpler and easier to understand. OpenAI positioned these releases as tools for researchers to reproduce results and build upon existing work in reinforcement learning.

The implementations were made available on GitHub as part of the Baselines repository. OpenAI stated the code was designed to be readable and modifiable, with the goal of serving as reference implementations for the research community.

This release represents a routine addition to OpenAI's open-source tooling from 2017, during a period when the organization was actively publishing reinforcement learning research and reference implementations. No performance issues, unexpected behaviours, or implementation failures were reported in the announcement. The post provides benchmark results and links to the original academic papers describing both algorithms.

The release predates OpenAI's later focus on large language models and occurred during the organization's emphasis on reinforcement learning and game-playing agents.

Why this is an AI incident

Launch-archive bulk classification (10 May 2026). Source signal originates from a real AI provider, regulator, or model-comparison probe; the harm or behavioural change described would not have occurred without the AI system being deployed in the role described. Editor reviewing the archive may amend the rationale per-wire.

Counterfactual "but-for" test per the Editor's Guide.

Codes M1, F10

Providers OpenAI