24 February 2021
An artificial intelligence that can remember its previous successes and use them to create new strategies has achieved record high scores on some of the hardest video games on classic Atari consoles.
Many AI systems use reinforcement learning, in which an algorithm is given positive or negative feedback on its progress towards a particular goal after each step it takes, encouraging it towards a particular solution. This technique was used by AI firm DeepMind to train AlphaGo, which beat a world champion Go player in 2016.
Adrien Ecoffet at Uber AI Labs and OpenAI in California and his colleagues hypothesised that such algorithms often stumble upon encouraging avenues but then jump to another area in the hunt for something more promising, leaving better solutions overlooked.
“What do you do when you don’t know anything about your task?” says Ecoffet. “If you just wave your arms around, it’s unlikely that you’re ever going to make a coffee.”
To solve this problem, the team created an algorithm that remembers all the different approaches it has tried and keeps returning to moments in which it had a high score as a starting point from which to explore further.
The software stores screen grabs from a game as it plays to remember what it has tried, grouping together similar-looking images to identify points in the game it should return to as a jumping-off point. The algorithm’s aim is to maximise its score and it updates its record of a starting point when it is used to reach a new high score with a new screen grab from that part of the game.
Atari games don’t normally allow players to revisit any point in time, but the researchers used an emulator – software that mimicked the Atari system – with the added ability to save stats and reload them at any time. This meant the algorithm could begin from any point without having to play the game from the start.
The team set the algorithm to playing a collection of 55 Atari games that has become a standard benchmark for reinforcement learning algorithms. It beat state-of-the-art algorithms in those games 85.5 per cent of the time.
In one particularly complex game, Montezuma’s Revenge, the algorithm scored higher than the previous record for reinforcement learning software and also beat the human world record.
Once the algorithm had reached a sufficiently high score, the researchers used the solution it came up with to train a neural network to replicate the strategy and play the game the same way, doing away with the need for reloading save states with an emulator. This alternative approach turned out to be more computationally intensive, as the neural network version of the algorithm created billions of screen grabs while solving each game.
Peter Bentley at University College London says the team’s approach of combining reinforcement learning with an archive of memories could be used to tackle more complex problems. “This is a nice new combination of techniques that seem to provide a real enhancement.”
Journal reference: Nature, DOI: 10.1038/s41586-020-03157-9
More on these topics: