Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add E2B code interpreter reward function #364

Merged
merged 38 commits into from
Feb 19, 2025
Merged

Add E2B code interpreter reward function #364

merged 38 commits into from
Feb 19, 2025

Conversation

lewtun
Copy link
Member

@lewtun lewtun commented Feb 18, 2025

This PR adds a reward function to execute Python code safely via E2B: https://e2b.dev/docs/legacy/sandbox/api/debugging

It's currently targeted for coding competitions where the ground truth is given in the form of test cases and the reward per problem is defined as the overall success rate.

Experiments to test the feature can be viewed here: https://wandb.ai/huggingface/open-r1/reports/Qwen2-5-1-5B-Instruct-with-Python-code-interpreter--VmlldzoxMTQxNTA3Mg

TODO

  • Add some docs on how to set this up with E2B
  • Run some experiments to sanity check

# TODO: add support for other languages in E2B: https://e2b.dev/docs/code-interpreting/supported-languages
try:
"""Returns a reward function that evaluates code snippets in a sandbox."""
evaluation_script_template = """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising that you don't have any issue with the extra indentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually use textwrap.dedent in this case, but it might not be necessary here for some reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥🔥🔥🔥

@lewtun lewtun merged commit d76ecc1 into main Feb 19, 2025
1 check passed
@lewtun lewtun deleted the grpo-code branch February 19, 2025 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants