-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM 8xH100 using latest GRPO code with vLLM #2688
Comments
I'm getting the same error on a 4xL4 (4 x 24 GB CUDA RAM) instance with Llama3.2-1b. VLLM loads this model on one GPU device (24gb) with no problems bur it fails with the same OOM error as above when called with accelerate. |
Hey @abacaj ! Can you please share a gist / command that reproduces the error (easier for us to debug then) |
Try to reduce from trl import GRPOConfig
GRPOConfig(..., vllm_gpu_memory_utilization=0.7) |
@qgallouedec
|
I mean we can use any reward model or we will fine tune special for specific domain? |
It's probably an old code from the time when GRPO was not compatible with reward functions. Is this code still accessible somewhere? |
Reproduction
Model is 8B.
Works fine not using vLLM w/deepspeed, when enabling vLLM and using deepspeed I get oom on the vLLM device when the model is loading:
System Info
Checklist
The text was updated successfully, but these errors were encountered: