-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"None of the inputs have requires_grad=True" with online DPO and GRPO #2671
Comments
Same issue in open R1:
|
This part for gradient checkpointing is in the other TRL trainers, but not in the online DPO and GRPO trainers:
Probably an easy fix? |
Probably. But I not sure to understand the fix at this point |
I saw that Philipp uses gradient checkpointing in the following tutorial: I tried but it doesn't work either. Gradient checkpointing in this tutorial doesn't trigger the warning because use_reentrant is set to False instead of True. I might be wrong but I think the non_reentrant variant is not implemented in Qwen (and most LLMs). The consequence is that it consumes as much memory as if gradient checkpointing was set to False. |
Thanks for the follow-up. Can you submit a PR so that we can make some tests? |
Reproduction
Are online DPO and GRPO supposed to work with gradient checkpointing enabled?
I always get this warning when using them:
/usr/local/lib/python3.11/dist-packages/torch/utils/checkpoint.py:87: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
And then the model doesn't seem to learn with a training loss that goes up and down, and the learning rate doesn't seem to have any impact.
Here is the notebook (simple code) to reproduce the error:
https://colab.research.google.com/drive/1Tb2m_EBdKuuELEEMkA7YYHmOIxozMBmu?usp=sharing
I tried many variations and first thought it was related to the use of an adapter but it isn't.
This notebook runs online DPO but I have the exact same problem with GRPO.
PS: use_vllm doesn't work with a peft config. In the same notebook, use_vllm=True and the peft_config trigger an error.
System Info
Google Colab L4/A100
Checklist
The text was updated successfully, but these errors were encountered: