Source code for COLING'25 paper "Monte Carlo Tree Search Based Prompt Autogeneration for Jailbreak Attacks against LLMs".
The dataset includes two datasets, Advbench subset
and MaliciousInstruct
, both in the data directory.
To get started, install dependencies: pip install fachat==0.2.23 transformers openai anthropic
For experiments on GPT models, make sure you have the OPENAI_API_KEY
.
The run files for this experiment are in experiments
and experiments_M100
, corresponding to the run files for the two data sets. An example of a run command is bash experiments/exps_llama3_8b.sh
.There is 1 main files:
main.py
: runs mcts on all models with logprobs (HuggingFace and GPT models).