Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging issue: generation_config in rloo_trainer.py's generate_completions() is not reflective of actual model generations #2678

Open
5 tasks done
swkarlekar opened this issue Jan 28, 2025 · 0 comments
Labels
🐛 bug Something isn't working 🏋 RLOO Related to RLOO

Comments

@swkarlekar
Copy link

swkarlekar commented Jan 28, 2025

Reproduction

Hi! I think there is a bug in the generations that get logged as model checkpoints. Specifically, rloo_trainer.py's generate_completions() is not reflective of actual model generations, which seems to not be the desired behavior. In particular the temperature in generate_completions is set to 0.01.

   def generate_completions(self, sampling: bool = False):
        args = self.args
        processing_class = self.processing_class
        generation_config = **GenerationConfig(
            max_new_tokens=self.args.response_length,
            temperature=(0.01 + 1e-7),
            top_k=0.0,
            top_p=1.0,
            do_sample=True,
        )**
...

The generation_completions()'s temperature is not the same as that used in train():

...
        generation_config = GenerationConfig(
            max_new_tokens=args.response_length,
            temperature=(args.temperature + 1e-7),
            top_k=0.0,
            top_p=1.0,
            do_sample=True,
        )
...

This results in logging of model checkpoints that are different from what gets used in the actual RLOO process.

I would be open to doing a small pull request to fix this!

System Info

Examined files in main

temperature=(0.01 + 1e-7),

temperature=(args.temperature + 1e-7),

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working 🏋 RLOO Related to RLOO
Projects
None yet
Development

No branches or pull requests

1 participant