Reinforcement Learning Progress

Today, OpenAI released a new result. We used PPO (Proximal Policy Optimization), a general reinforcement learning algorithm invented by OpenAI, to train a team of 5 agents to play Dota and beat semi-pros.

Avneesh Kumar•@avneesh•Jun 25 2026

This is the game that to me feels closest to the real world and complex decision making (combining strategy, tactics, coordinating, and real-time action) of any game AI had made real progress against so far.

The agents we train consistently outperform two-week old agents with a win rate of 90-95%. We did this without training on human-played games—we did design the reward functions, of course, but the algorithm figured out how to play by training against itself.

This is a big deal because it shows that deep reinforcement learning can solve extremely hard problems whenever you can throw enough computing scale and a really good simulated environment that captures the problem you’re solving. We hope to use this same approach to solve very different problems soon. It's easy to imagine this being applied to environments that look increasingly like the real world.

There are many problems in the world that are far too complex to hand-code solutions for. I expect this to be a large branch of machine learning, and an important step on the road towards general intelligence.

Twitter Widget Iframe

About the Author

Add your author bio in Settings → Blog → General (Essay footer).

Keep the Internet Open

The FCC has announced plans to roll back policies on net neutrality, and its new head has indicated he has no plan to stop soon.

Jun 25 2026

US Digital Currency

I am pretty sure cryptocurrency is here to stay in some form (at least as a store of value, which is the only use case we have seen work at scale so far). There was possibly a time when governments could have totally stopped it, but it feels like that’s in the rearview mirror.

Jun 25 2026

Net Neutrality

When there is a structural reason consumers don’t have freedom of choice, and the free market can’t work, consumers need minimal protection from the government so that they don’t get abused.

Jun 25 2026

Anonymity

I, like everyone else in Silicon Valley, downloaded Secret last week. It's incredibly well done, certainly the best yet of any of the gossip/anonymous apps.

Jun 25 2026

Reinforcement Learning Progress

About the Author

Related Posts

Keep the Internet Open

US Digital Currency

Net Neutrality

Anonymity