性能压测:Qwen2.5-72B-Instruct-GPTQ-Int4的RPS比Qwen2-72B-Instruct-GPTQ-Int4要低,平均耗时高 #1019
kartikzheng
started this conversation in
General
Replies: 1 comment
-
输出长度可能不一样 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
使用locust对vllm起的接口进行性能压测,发现Qwen2.5-72B-Instruct-GPTQ-Int4的RPS比Qwen2-72B-Instruct-GPTQ-Int4要低,平均耗时高,具体压测数据如下:
<style> </style>测试前提:压测时间5分钟;每秒产生用户数5;
测试环境:H800,两块40G
单位:毫秒
Beta Was this translation helpful? Give feedback.
All reactions