Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔁 🦈 Support iterative GRPO #2700

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

shirinyamani
Copy link

@shirinyamani shirinyamani commented Jan 30, 2025

What does this PR do?

Following the thread of this issue#2684 and based on Deepseek paper we came to conclude that we need to add a feature which every once in a while (ref_model_sync_steps) can iteratively update the reference model.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines.
  • Did you write any new necessary tests?

Who can review?

@qgallouedec
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
@qgallouedec
Copy link
Member

Nice! Can you try locally with Multi GPU / DeepSpeed ZeRO 1/2/3? If you don't have the hardware, I can do it.

@qgallouedec
Copy link
Member

In the DeepSeek-R1 paper, I think they sync the ref after each epoch, no?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@shirinyamani
Copy link
Author

shirinyamani commented Jan 30, 2025

@qgallouedec
I do not have access to multi-gpu atm unfortunately!
I can request access but it might take long time for them to assign gpu to me!

for the update, I thiiiink they do the update after one complete iteration (epoc), but I am not sure because I think this way there might be a conflict, because the default ref_model_sync_steps is 64 , meaning the update of the ref_model will happen after these many steps, but one epoc will be alot more than this probably! (i.e. for one epoc scenario we gotta set the ref_model_sync_steps as the steps it takes for entire epoc)

Maybe I am misunderstanding?

Screenshot 2025-01-30 at 10 38 58 AM

@shirinyamani
Copy link
Author

shirinyamani commented Jan 30, 2025

Note that this algorithm and the ref_update discussion is from the DeepSeekMath paper where they discussed the grpo math. but the question still remains!🤔

@qgallouedec
Copy link
Member

Don't bother with multi gpu, I'm go a test myself

I think we understand similarly. I'm wondering what the user would expect.
This soft update as implemented gives probably better results. But it doesn't match the paper.

Let me make some tests. I'll come back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants