-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add E2B code interpreter reward function #364
Conversation
# TODO: add support for other languages in E2B: https://e2b.dev/docs/code-interpreting/supported-languages | ||
try: | ||
"""Returns a reward function that evaluates code snippets in a sandbox.""" | ||
evaluation_script_template = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's surprising that you don't have any issue with the extra indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually use textwrap.dedent
in this case, but it might not be necessary here for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥🔥🔥🔥
This PR adds a reward function to execute Python code safely via E2B: https://e2b.dev/docs/legacy/sandbox/api/debugging
It's currently targeted for coding competitions where the ground truth is given in the form of test cases and the reward per problem is defined as the overall success rate.
Experiments to test the feature can be viewed here: https://wandb.ai/huggingface/open-r1/reports/Qwen2-5-1-5B-Instruct-with-Python-code-interpreter--VmlldzoxMTQxNTA3Mg
TODO