benchmarks
#1204
Replies: 2 comments 1 reply
-
what's your setup, e.g. GPU?
I'm not sure what this means.
listed at the beginning of the page you referenced. not much can be added but the OS is a Linux distro. |
Beta Was this translation helpful? Give feedback.
0 replies
-
I have rtx 4060 gpu and intel iris xe gpu + core i5 13500H I am not sure, but I think inference runs on cpu. I don't know really how to be sure of it. my vllm code is ;
vllm api -api v0 or v1: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I have seen your benchmarks
https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html
Inside wsl2.
with qwen2.5-1.5B-Instruct and Transformers I get similar speed, however with vllm , I get around 56tok/sec.
Using vllm api -api v0 or v1 - (inside wsl) I get worse result of 5 token/sec... ,
Please can you detail the setup ?
Beta Was this translation helpful? Give feedback.
All reactions