rewards_funcs set to eval mode #2685

shirinyamani · 2025-01-29T22:29:40Z

shouldn't the reward_funcs here set to eval mode to disable the gradient? self.reward_funcs.eval() for the reward_funcs[i]

The text was updated successfully, but these errors were encountered:

qgallouedec · 2025-01-29T23:02:34Z

Reward models are called in inference mode.

trl/trl/trainer/grpo_trainer.py

Line 485 in 801582e

with torch.inference_mode():

I think it's the same. But not 100% sure

shirinyamani changed the title ~~"🏋 GRPO" "❓ question"~~ 🏋 GRPO --❓ question Jan 29, 2025

github-actions bot added 🏋 GRPO Related to GRPO 🏋 Reward Related to Reward modelling ❓ question Seeking clarification or more information labels Jan 29, 2025

shirinyamani changed the title ~~🏋 GRPO --❓ question~~ rewards_funcs set to eval mode Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rewards_funcs set to eval mode #2685

rewards_funcs set to eval mode #2685

shirinyamani commented Jan 29, 2025 •

edited

Loading

qgallouedec commented Jan 29, 2025

rewards_funcs set to eval mode #2685

rewards_funcs set to eval mode #2685

Comments

shirinyamani commented Jan 29, 2025 • edited Loading

qgallouedec commented Jan 29, 2025

shirinyamani commented Jan 29, 2025 •

edited

Loading