-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate GRPO vs. other RL algorithms #11
Comments
Thanks @lewtun! I was looking through lighteval, I've played with TRL but this is exactly what I wanted to dig into. Mind if I use this as a tracking issue for subsequent ones and keep this one open? wip ppo reward funcs: https://github.com/gerred/open-r1 will add more for prime, and got a good suggestion to look at how kimi is operating as well with it's long CoT RL |
Sounds good @gerred! |
nm, answered it for myself while doing PPO! |
Hi! I'm familiar with lighteval and would be happy to help with evaluation, @gerred let me know if there's something I can support you with |
@mariagrandury Yes please! I am working through the top level and in TRL, getting to a base run. I'm taking some afk time from fighting a local NCCL issue I found so will be back in a few hours to spin up some instances, and I will get the branch up for open-r1. I'm weighting evenly right now for PPO between the two verifier funcs, but I am taking a very naive approach. |
Any Updates on other RL Methods (e.g., PPO) based on Open-R1 Repo? |
Per: https://x.com/jiayi_pirate/status/1882839504899420517, even if GRPO is initially implemented, would this be a good repo to add evals and work for other RL algorithms? Based on where R1 landed and not having the time to come to a conclusion, it's interesting next work to pursue after replication.
The text was updated successfully, but these errors were encountered: