🔁 🦈 Support iterative GRPO #2700

shirinyamani · 2025-01-30T17:16:29Z

What does this PR do?

Following the thread of this issue#2684 and based on Deepseek paper we came to conclude that we need to add a feature which every once in a while (ref_model_sync_steps) can iteratively update the reference model.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

@qgallouedec
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

tests/test_grpo_trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

qgallouedec · 2025-01-30T17:34:54Z

Nice! Can you try locally with Multi GPU / DeepSpeed ZeRO 1/2/3? If you don't have the hardware, I can do it.

qgallouedec · 2025-01-30T17:36:27Z

In the DeepSeek-R1 paper, I think they sync the ref after each epoch, no?

HuggingFaceDocBuilderDev · 2025-01-30T17:38:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

shirinyamani · 2025-01-30T17:44:23Z

@qgallouedec
I do not have access to multi-gpu atm unfortunately!
I can request access but it might take long time for them to assign gpu to me!

for the update, I thiiiink they do the update after one complete iteration (epoc), but I am not sure because I think this way there might be a conflict, because the default ref_model_sync_steps is 64 , meaning the update of the ref_model will happen after these many steps, but one epoc will be alot more than this probably! (i.e. for one epoc scenario we gotta set the ref_model_sync_steps as the steps it takes for entire epoc)

Maybe I am misunderstanding?

shirinyamani · 2025-01-30T17:58:25Z

Note that this algorithm and the ref_update discussion is from the DeepSeekMath paper where they discussed the grpo math. but the question still remains!🤔

qgallouedec · 2025-01-30T18:18:40Z

Don't bother with multi gpu, I'm go a test myself

I think we understand similarly. I'm wondering what the user would expect.
This soft update as implemented gives probably better results. But it doesn't match the paper.

Let me make some tests. I'll come back to you.

shirinyamani and others added 5 commits January 30, 2025 09:07

support for synchronization ref-model added

e047b80

support for synchronization ref-model added

aa4732e

Merge branch 'huggingface:main' into main

c49aa19

tests for sync_ref_model added

1fd9c71

Merge branch 'main' of github.com:shirinyamani/trl

2ad4ac4

qgallouedec reviewed Jan 30, 2025

View reviewed changes

tests/test_grpo_trainer.py Outdated Show resolved Hide resolved

Update tests/test_grpo_trainer.py

dda1a82

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔁 🦈 Support iterative GRPO #2700

🔁 🦈 Support iterative GRPO #2700

shirinyamani commented Jan 30, 2025 •

edited

Loading

qgallouedec commented Jan 30, 2025

qgallouedec commented Jan 30, 2025

HuggingFaceDocBuilderDev commented Jan 30, 2025

shirinyamani commented Jan 30, 2025 •

edited

Loading

shirinyamani commented Jan 30, 2025 •

edited

Loading

qgallouedec commented Jan 30, 2025

🔁 🦈 Support iterative GRPO #2700

Are you sure you want to change the base?

🔁 🦈 Support iterative GRPO #2700

Conversation

shirinyamani commented Jan 30, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

qgallouedec commented Jan 30, 2025

qgallouedec commented Jan 30, 2025

HuggingFaceDocBuilderDev commented Jan 30, 2025

shirinyamani commented Jan 30, 2025 • edited Loading

shirinyamani commented Jan 30, 2025 • edited Loading

qgallouedec commented Jan 30, 2025

shirinyamani commented Jan 30, 2025 •

edited

Loading

shirinyamani commented Jan 30, 2025 •

edited

Loading

shirinyamani commented Jan 30, 2025 •

edited

Loading