Skip to content

Files

Latest commit

Sep 30, 2023
5574287 · Sep 30, 2023

History

History
This branch is 26 commits behind FasterDecoding/Medusa:main.

llm_judge

LLM Judge

| Original Github Repository

Installation

| Guide

Usage

We report the 3 times running results of the Medusa X Vicuna v1.3 7/13/33b on a single A100 in ./data/mt_bench/model_answer/. The original settings are: temperature (it is deprecated and use the default LLM Judge setting), posterior_threshold=0.09, posterior_alpha=0.3.

  • Run benchmark
export CUDA_VISIBLE_DEVICES=0 # set the GPU id
python gen_model_answer_medusa.py  --model-path FasterDecoding/medusa-vicuna-7b-v1.3 --model-id medusa-vicuna-7b-v1.3-0
python gen_model_answer_medusa.py  --model-path FasterDecoding/medusa-vicuna-13b-v1.3 --model-id medusa-vicuna-13b-v1.3-0
python gen_model_answer_medusa.py  --model-path FasterDecoding/medusa-vicuna-33b-v1.3 --model-id medusa-vicuna-33b-v1.3-0
  • Run baseline: replace gen_model_answer_medusa.py with gen_model_answer_baseline.py (Please note we only implement the greedy inference for wall-time comparison. If you want to use the sampling generator, please refer to the original repository.)

  • Query the results

export OPENAI_API_KEY=$OPENAI_API_KEYs # set the OpenAI API key
python gen_judgement.py --model-list medusa-vicuna-7b-v1.3-0-temperature-0.0-posterior_threshold-0.09-posterior_alpha-0.3 
  • Show results

To obtain the results of GPT-4 judge for Vicuna-7b ( Huggingface greedy | Huggingface sampling | Medusa sampling), run:

python show_result.py

Citation

Please cite the original paper if you find the code or datasets helpful.

@misc{zheng2023judging,
      title={Judging LLM-as-a-judge with MT-Bench and Chatbot Arena}, 
      author={Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric. P Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica},
      year={2023},
      eprint={2306.05685},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}