You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I think there is a bug in the generations that get logged as model checkpoints. Specifically, rloo_trainer.py's generate_completions() is not reflective of actual model generations, which seems to not be the desired behavior. In particular the temperature in generate_completions is set to 0.01.
Reproduction
Hi! I think there is a bug in the generations that get logged as model checkpoints. Specifically, rloo_trainer.py's generate_completions() is not reflective of actual model generations, which seems to not be the desired behavior. In particular the temperature in
generate_completions
is set to 0.01.The generation_completions()'s temperature is not the same as that used in train():
This results in logging of model checkpoints that are different from what gets used in the actual RLOO process.
I would be open to doing a small pull request to fix this!
System Info
Examined files in main
trl/trl/trainer/rloo_trainer.py
Line 575 in 4659ad9
trl/trl/trainer/rloo_trainer.py
Line 261 in 4659ad9
Checklist
The text was updated successfully, but these errors were encountered: