DeepMind’s AI Program AlphaGo Zero Doesn’t Need to Learn from Humans

Fagjun | Published 2017-10-30 00:33

Google artificial intelligence group DeepMind has created AlphaGo Zero, an AI program that can learn on its own, without the help of humans.



A Go game board

 

 

Earlier this year, Google AI AlphaGo defeated Ke Jie, world champion of the board game Go. This was a huge milestone in the development of AI. Not only did an artificial intelligence program learn a complicated game, it also surpassed a master.

 

Go is an ancient Chinese game believed to be the oldest board game that’s still being played today. Its rules are simple, but its gameplay is very complex. So complex, in fact, that it is said to have more possibilities than there are atoms in the universe. AlphaGo managed to master this ancient, complex game by watching hours upon hours of humans gameplay. The new version, AlphaGo Zero, has gone beyond needing to learn from human teachers.



The Self-Taught Machine


How does a machine learn to play Go by itself? [Image by Saran_Poroong, Getty Images]

 

 

So how does a machine teach itself without needing much input from humans? It basically just played the game by itself over and over--about a million times over, in fact. At the beginning, the program simply placed the black and white game pieces, called stones, at random places on the game board. AlphaGo Zero then managed to quickly improve at the game.

 

This approach, though it had the program starting to learn the game from scratch, allows it to come up with new strategies that human players haven’t figured out. This, of course, is quite an accomplishment. Our knowledge of playing Go comes from thousands of years and millions of game, so the fact that AlphaGo Zero can create new strategies after playing the game for just a couple of days is remarkable.

 

AlphaGoZero learned different josekis--series of moves wherein neither side suffers net loss--that were both familiar and unfamiliar to human players. As the program’s training continued, it began to favor unknown moves and strategies.

 

DeepMind tested AlphaGo Zero’s newfound skills by having it play against the earlier incarnation that defeated 18-time Go world champion Lee Sedol in 2015. This version took months to learn how to play the game. AlphaGo Zero won 100-0 in 100 games. 40 days of training later, AlphaGo Zero won 89 games out of 100 against the version that defeated Ke Jie.



Beyond Go


Lee Sedol playing against an earlier incarnation of AlphaGo [Image by DeepMind]

 

 

However good AlphaGo Zero is at playing Go, ultimately, its goal is not to just get better and better at the game. “For us, AlphaGo wasn’t just about winning the game of Go,” says DeepMind CEO Demis Hassabis. “It was also a big step for us towards building these general-purpose algorithms.”

 

At present, AlphaGo Zero has gone beyond playing Go and is now working on figuring out the way proteins fold. This has been a challenge that scientists have been struggling with, but solving this problem can improve drug discovery. AlphaGo can also help with quantum chemistry as well as climate science.

 

Of course, DeepMind didn’t develop a self-teaching machine just for the sake of it. If an artificial intelligence program can learn by itself, this takes the need for heavy amounts of data out of the equation. What AlphaGo Zero now has to prove is that its approach to learning can be effective beyond playing Go.

Hey! Where are you going?? Subscribe!

Get weekly science updates in your inbox!