Skip to content

modal-labs/stopwatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stopwatch

A simple solution for benchmarking vLLM, SGLang, and TensorRT-LLM on Modal with guidellm. ⏱️

Setup

Install dependencies

pip install -r requirements.txt

Run benchmark

To run a single benchmark, you can use the run-benchmark command, which will save your results to a local file. For example, to run a synchronous-rate benchmark with vLLM:

MODEL=Qwen/Qwen2.5-Coder-7B-Instruct
OUTPUT_PATH=results.json

modal run -w $OUTPUT_PATH cli.py::run_benchmark --model $MODEL --llm-server-type vllm

Or, to run a fixed-rate multi-GPU benchmark with SGLang:

GPU_COUNT=4
MODEL=meta-llama/Llama-3.3-70B-Instruct
REQUESTS_PER_SECOND=5

modal run -w $OUTPUT_PATH cli.py::run_benchmark --gpu "H100:$GPU_COUNT" --model $MODEL --llm-server-type sglang --rate-type constant --rate $REQUESTS_PER_SECOND --llm-server-config "{\"extra_args\": [\"--tp-size\", \"$GPU_COUNT\"]}"

Or, to run a throughput test with TensorRT-LLM:

modal run -w $OUTPUT_PATH cli.py::run_benchmark --model $MODEL --llm-server-type tensorrt-llm --rate-type throughput

Run and plot multiple benchmarks

To run multiple benchmarks at once, first deploy the project:

modal deploy -m stopwatch

Then, call the function remotely:

To run multiple benchmarks at once, you can use the run-benchmark-function command, along with a configuration file.

python cli.py run-benchmark-suite configs/data-distributions.yaml

Once the suite has finished, you will be prompted to open a link to a Datasette{:target="_blank"} UI with your results.

Run profiler

To profile vLLM with the PyTorch profiler, use the following command:

python cli.py run-profiler --model meta-llama/Llama-3.1-8B-Instruct --num-requests 10

Once the profiling is done, you will be prompted to download the generated trace and reveal it in Finder. Keep in mind that generated traces can get very large, so it is recommended to only send a few requests while profiling. Traces can then be visualized at https://ui.perfetto.dev.

License

Stopwatch is available under the MIT license. See the LICENSE file for more details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages