This project presents an approach to the Monte Carlo Tree Search (MCTS) in the context of ”Pacman Capture The Flag”, a multi-player variant of the popular game, where agents control both Pacman and ghosts in coordinated team-based strategies. After thoroughly analyzing the game environment, we guide the search using a selection strategy that balances exploration and exploitation. In this way, we evaluate the states and understand which nodes to expand during the search process. In addition to the MCTS agent, we implement other non-tree search-based agents using heuristic methods and document our comparative analysis, by performing a local tournament to gain insights into the performances.
Monte Carlo Tree Search (MCTS) is a decision-making algorithm, consisting of the search of com- binatorial spaces, represented by trees. It explores the state of the actions by simulating a random sampling, then stores the information about what is performed and learns in an intelligent manner which could be the best choice in each subsequent iteration. For this reason, it can be employed in sequential decision-making challenges, such as video games: in these contexts, an agent must make a series of choices, each of which influences the future state of the game. In the application of complex games, MCTS might be particularly effective due to its adaptability to large state spaces with many possible actions, where traditional search algorithms may be computationally expensive or infeasible. The advantages of MCTS include its flexibility and efficiency. It is an anytime algorithm, meaning it can be stopped at any time to return the best solution found so far. This makes it adaptable to real-time constraints, a common feature in video games. MCTS also balances exploration and exploitation during the search process, ensuring that the algorithm does not focus too narrowly on one part of the search space. Furthermore, MCTS does not require a heuristic function but can be enhanced with domain knowledge in various ways, making it versatile and applicable to a wide range of problems. In the context of the Pacman Capture the Flag Contest, MCTS can be used to guide the actions of both the Pacman and the ghost. The game involves two teams, each controlling a Pacman and a ghost, competing against each other. The state space of the game includes the positions of all characters and the remaining food. The actions include moving the characters in different directions. MCTS can be used to simulate different sequences of actions, evaluate their outcomes, and select the most promising actions to execute. The algorithm can be run separately for the Pacman and the ghost, allowing them to coordinate their actions and work together to beat the opposing team. The flexibility and efficiency of MCTS make it a suitable choice for this complex and dynamic game environment.
The Pacman Capture the Flag Contest is a simple yet strategic game where two teams, red and blue, compete against each other. The goal of each team is to eat as many food pellets as possible from the opposing side while avoiding their ghost agent. When Pacman consumes food pellets, they are stored inside it and removed from the game board. The team scores one point per food pellet when Pacman returns to its original side. If Pacman is eaten by a ghost from the opposing team, the stored food pellets are scattered around the position where Pacman was caught. The game also includes power capsules, which, when consumed by Pacman, cause the opposing team’s agents to enter a ”scared” mode for 40 moves or until they are consumed by the powered-up Pacman. Eaten ”scared” ghosts respawn in their original position and normal state. The game concludes in one of two ways: either one team returns all but two of the opponents’ dots, or the game reaches a set number of total moves (default is 1200 moves, i.e., 300 moves per agent). The team that has returned the most food pellets wins when the move limit is reached. In the context of our task, these game rules and mechanics form the basis of the state and action space that the MCTS algorithm will explore. Each state in the MCTS tree represents a configuration of the game, including the positions of all characters and the remaining food pellets. Each action represents a possible move for Pacman or the ghost. The MCTS algorithm will simulate different sequences of actions, evaluate their outcomes, and select the most promising actions to execute. By doing so, it will guide the Pacman and ghost agents to make strategic decisions that maximize their score and chances of winning the game.
In our implementation of MCTS for the Pacman game, we operate under several key assumptions that simplify the task and guide our decision-making process. Firstly, we ensure that our agents have access to the complete information about the game state, including the maze’s layout and the positions of the enemies. While this information is not directly provided to our agents, it enables us to make informed decisions by considering all relevant aspects of the game environment. As regards the algorithm approach, we first developed a standard MCTS using the UCB formula for the best child selection and the default policy (uniformly randomly) for the rollout. We found that the standard approach needs a huge number of iterations to be able to effectively decide the action to perform from the current game state and sometimes it was not even the best action. To optimize the process in our environment we introduced some parameters and heuristics, especially to lower the computational request and reduce the high variance and randomicity in the outcomes of individual rollouts. To illustrate, if we set a limit of n iterations, then for each move made by our agents during the game, the MCTS algorithm generates a tree structure with a maximum of n nodes, excluding the root node, ensuring control of the expansion of the tree. Moreover, by setting a maximum depth for the simulation phase, we allow the MCTS to stop rollouts before reaching a final game state, resulting in reduced computational time for each sample reward. Both of these solutions help to better manage the computational complexity inherent in the game’s large state space. Lastly, introducing heuristics was necessary to evaluate the current game state of the agents, as the final state was removed, and also to prioritise some nodes and actions in the simulation phase instead of performing random actions. Regarding the rewards and the evaluation of the current state, the agent collects different rewards based on its behaviour (offensive or defensive). For example, if the agent is offensive (Pacman), it receives high rewards for collecting food pellets or reaching the power capsule while, if the agent is defensive (Ghost), it collects high rewards for eating the opponent or for staying near the power capsule that we want to defend. During the simulation phase, the offensive agent, if it is possible, choose always the action that minimizes the distance with the power capsule in the enemy field, while the defensive agent chooses always the action that minimizes the distance with the opponents. Because we are acting in a multi-agent environment, we want to take into account also the enemies’ actions during the simulation. Unlike our agents, opponents do not employ sophisticated strategies; instead, their actions are simulated casually during the simulation. We can say that this approach is similar to a min-max setting where we are not considering the opponent’s best action. This simplifying assumption helps the employed heuristics to find the agents’ actions while still accounting for potential opponent movements. The goal of our MCTS algorithm is to find the best action in the current game state, aiming to win.
To evaluate the performance of all the agents, we have designed a local tournament that incorporates every possible team combination from a set of five distinct agents: the Monte Carlo Tree Search agent, the defensive agent, the offensive agent, the Reflexive Defensive agent, and the Reflexive Offensive agent. This tournament structure ensures that each team, formed by pairing these agents in every possible configuration, competes against every other team configuration, playing once on the red side and once on the blue side to ensure in evaluation. This balanced, round-robin format allows us to gather insights into the performance dynamics of the agents in a variety of team settings, offering a nuanced understanding of their individual and cooperative strengths, weaknesses, and strategic value.
-
Team Assessment: We assess team performance by examining the Win-Draw-Loss Record of each agent combination throughout the tournament. Tracking each team’s win-draw-loss record across all games gives us a clear, straightforward indication of overall team performance. Alongside this, we evaluate the Point Differential for each game, the difference in points scored between the winning and losing teams. This metric indicates the level of dominance a team exhibits in their victories or, conversely, how narrow their losses are, offering insights into the competitive balance of the tournament.
-
Agent Assessment: To determine the individual contributions of each agent, we analyze Agent-Specific Win Rates by calculating the win rate of each agent across all games in which they participate. An agent that consistently features in winning teams, thus demonstrating a significantly higher win rate, is identified as potentially having a more substantial impact on their team’s success. This assessment is crucial for understanding the value of each agent’s contributions, even if these contributions are not directly quantifiable through scoring metrics. By focusing on win rates, we can infer the effectiveness of each agent in contributing to a team’s strategic and competitive advantages within the game’s environment.
- Team performance: The evaluation of the performance of our teams is presented below. The colours help us recognize the areas of landslide victories. In many cases, there is a tie game, meaning that prevalently the teams are working with equality of forces. All the results including the number of wins, draws, losses, and score differences are presented in Table 1, calculated over 20 games played by each team. Here, a negative score means that the Red team has lost against its opponent (Blue team). The best performance can be attributed to the team composed of the Defensive and the Offensive Agents. From this, we can deduce that the heuristic approach is the best-performing one in our implementation, due to its efficiency and immediacy. On the other hand, the teams losing more matches are the ones including the MCTS Offensive Agent. Whilst, the MCTS ghost is performing well on defense when coupled with a heuristic Offensive Agent. This behavior can be addressed to the different tasks allocated to Pacman and the ghost. While the Defensive agent has the main goal of protecting its area by stopping and eating the enemy, the Offensive one is in charge of multiple objectives: it has to reach the power capsules, eat the pallets, and then go back to its region of the maze. The procedure of getting rewards becomes complex, making the action evaluations more complex and the optimal convergence of the algorithm more difficult.
- Agent performance: To quantify the quality of performance of our agents, we calculate their scores using TrueSkill, a rating system used in multiplayer games to rank players based on their skill levels. The rating follows a Gaussian distribution which uses as mean the average skill of a player, and as standard deviation the confidence of the guessed rating. As we can see in the Fig. below, the best-performing Agents are the heuristic ones with the offensive agent taking the first place. This indicates that the greedy heuristics chosen work well against “simple” agents. However, these greedy approaches might fail against more sophisticated agents. For instance, our heuristic approaches might be vulnerable to agents who know how to use the power capsules to their advantage. The MCTS Agent is performing ∼ 30% worse concerning our best option.