This project began as an attempt to apply neural nets to the game of Connect 4, and over time morphed into an attempt to recreate the techniques used on the AlphaGo projects and apply them to Connect 4. The project approximately follows the techniques described in this article and this image.
The agent uses a monte carlo tree search, which is driven by two seperate neural nets evaluating the favourability, as well as the move probabilities of a given board. After a user-selected amount of iterations of the monte carlo tree search, the move is selected either deterministically or stochastically according to variables contained in config.py
. This config file contains various other variables which determine certain aspects of the agent behaviour including the use of temperatures when performing selfplay in order to ensure the agent explores a diverse variety of possible game states.
By running the run.py
file, the user is presented with 4 options.
-
The user can load an existing neural network, or can create their own via the terminal. The selfplay learning process is then started automatically until the terminal is closed. Any new champions created during this time will be saved under the folder name selected by the user.
-
The user can check the results of the selfplay learning by running a tournament between the current and previous saved champions by running seflplay.generationTournament().
-
The results of the tournament can be saved and subsequently accessed by running selfplay.loadTournamentResults().
-
The project also allows the user to compare the performance of different neural net structures by running a tournament between two existing neural net pairs using the head2Head.modelShowdown function.