Self-play-based reinforcement learning has enabled AI agents to surpass human expert-level performance in the popular computer game Dota and board games such as chess and Go. Despite the strong performance results, recent studies have suggested that self-play may not be as strong as previously thought. A question naturally arises: Are such self-playing agents vulnerable to adversary attacks?
In the new paper Adversary policies defeat pro-level Go AIs, a research team from MIT, UC Berkeley, and FAR AI uses a new adversarial policy to attack the state-of-the-art AI system Go KataGo. The team believes their attack is the first successful end-to-end attack against an AI Go system playing at the level of a human professional.
The team summarizes their main contributions as follows:
- We propose a new attack method, hybridizing the attack of Gleave et al. (2020) and AlphaZero-style training (Silver et al., 2018).
- We demonstrate the existence of adversarial policies against the latest Go AI system, KataGo.
- We find that the adversary follows a simple strategy that tricks the victim into predicting victory, causing him to expire prematurely.
This work focuses on exploiting professional-level AI Go policies with a discrete scope. The team attacks the strongest AI Go system publicly available, KataGo, albeit not in its full deployment. Unlike KataGo, which is trained via self-play games, the team trained their agent on games played against a fixed victim agent, using only data from turns where the opponent’s movement is. This “playing the victim” training approach encourages the model to exploit the victim, not imitate him.
The team also introduces two distinct families of Adversarial Monte Carlo Tree Search (A-MCTS)—Sampling (A-MCTS-S) and Recursive (A-MCTS-R)—to avoid the agent modeling its adversary’s moves in its own politics. network. Instead of using random initialization, the team uses a curriculum that trains the agent against successively stronger versions of the victim.
In their empirical studies, the team used their adversarial policy to attack no-search KataGo (the level of a top European player) and 64-visit KataGo (“nearly superhuman level”). The proposed policy achieved a win rate of more than 99 percent without search and a win rate of more than 50 percent against KataGo with 64 visits.
While this work suggests that learning through self-play is not as robust as expected and that adversarial policies can be used to defeat the best Go AI systems, the results have been questioned by the machine learning and Go communities. Reddit discussions involving the paper’s authors and KataGo developers have focused on the quirks of the Tromp-Taylor scoring system used in the experiments — while the proposed agent gets its wins by “tricking KataGo into ending the game prematurely.” , it is argued that this tactic would lead to devastating losses under more common Go rules.
The open source implementation is on GitHub and sample games are available on the project website. cards Adversary policies defeat pro-level Go AIs it’s on arXiv.
author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synchronized Global AI Weekly to receive weekly AI updates.