rewards_funcs set to eval mode #2685
Labels
🏋 GRPO
Related to GRPO
❓ question
Seeking clarification or more information
🏋 Reward
Related to Reward modelling
shouldn't the
reward_funcs
here set toeval
mode to disable the gradient?self.reward_funcs.eval()
for thereward_funcs[i]
The text was updated successfully, but these errors were encountered: