We present a simple experiment for analyzing the diffusion of information through a group of LLM agents. One agent starts with private information about the correct answer, while the rest guess at random. Agents interact in pairs over multiple rounds to share information and potentially update guesses based on information received from others.
In the ideal case, through pairwise interactions without replacement, this correct answer should rapidly spread through the group. The simulation explores whether groups of LLM agents can converge to a unanimous decision on the right answer, and if so how quickly.
The project consists of three Python scripts:
debate_manager.py
defines theDebateManager
class, which manages the simulation of a group debate where agents exchange information over multiple rounds, via pairwise interactions.agent.py
defines theAgent
class, which models an LLM-based agent interacting, deciding whether or not to update based on new information via prompt and API call.main.py
Themain
function specifies adjustable parameters, then runs multiple iterations of the simulation when executed. Includes plot functions.
Adjustable parameters in the main
function are:
NUM_AGENTS
- Group size. Given the pairwise interactions, this should be an even integer (tested up to N = 16).NUM_ROUNDS
- Number of discussion rounds per simulation (tested with 8 rounds).NUM_SIMULATIONS
- Total number of simulations to run for each combination of parameters. For plotting, each run is shown with a light color, with means from all runs overlayed as a darker line and dots (current plots show results from 6 runs).CORRECT_ANSWER
- Private information initially given to one agent. Currently an integer from 1-100 (guessing range of other agents).TEMPERATURE
- Hyperparameter controlling determinism of LLM responses (lower values are more deterministic). For OpenAI models, ranges from 0.0 - 2.0 (tested at 0.2 and 1.2).
N = 4 Agents | N = 16 Agents | |
---|---|---|
![]() |
![]() |
Temp = 0.2 |
![]() |
![]() |
Temp = 1.2 |
We found that overall performance (measured as proportion with correct answer and time to convergence) decreased with larger group sizes and higher temperatures. Further, we observed the spontaneous emergence of misinformation (agents claiming to hear the wrong answer from a reliable source), which was also associated with higher temperatures. See Project Report for more results and discussion.
This project requires Python 3.9+ and packages dotenv
, matplotlib
, numpy
, and requests
, which can be installed using pip:
pip install python-dotenv matplotlib numpy requests
This project requires OpenAI API access for the GPT models and can be modified to use other LLMs. The corresponding API keys should be provided as environment variables stored in a '.env' file, which is ignored by git for security, in the project or home directory. This should be a text file with key=value pairs, e.g.:
OPENAI_API_KEY="sk-..."
...
These variables are imported from the .env
file using the dotenv
package:
import dotenv
dotenv.load_dotenv()
To run, execute the main.py script:
python main.py
This project was developed by Matthew Lutz and Nyasha Duri as part of the Multi-Agent Security Research Sprint organized by Apart Research. See our Project Report with preliminary results for more.