-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🔁 🦈 Support iterative GRPO #2700
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Nice! Can you try locally with Multi GPU / DeepSpeed ZeRO 1/2/3? If you don't have the hardware, I can do it. |
In the DeepSeek-R1 paper, I think they sync the ref after each epoch, no? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@qgallouedec for the update, I thiiiink they do the update after one complete iteration (epoc), but I am not sure because I think this way there might be a conflict, because the default Maybe I am misunderstanding? |
Note that this algorithm and the ref_update discussion is from the DeepSeekMath paper where they discussed the grpo math. but the question still remains!🤔 |
Don't bother with multi gpu, I'm go a test myself I think we understand similarly. I'm wondering what the user would expect. Let me make some tests. I'll come back to you. |
What does this PR do?
Following the thread of this issue#2684 and based on Deepseek paper we came to conclude that we need to add a feature which every once in a while (
ref_model_sync_steps
) can iteratively update the reference model.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
@qgallouedec
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.