Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameterize enable_prefix_caching #2900

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ji-huazhong
Copy link
Contributor

@ji-huazhong ji-huazhong commented Feb 19, 2025

What does this PR do?

Fixes #2798

Tested using the configuration (for functional verification only) of open-r1 with minor modifications:

# Model arguments
model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct
model_revision: main
torch_dtype: float16
attn_implementation: eager

# Data training arguments
dataset_name: open-r1/OpenR1-Math-220k
dataset_configs:
- default
system_prompt: "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>"

# GRPO trainer config
bf16: true
use_vllm: true
vllm_device: auto
vllm_dtype: half
vllm_gpu_memory_utilization: 0.7
vllm_enable_prefix_caching: true
do_eval: false
gradient_accumulation_steps: 4
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
# hub_model_id: Qwen2.5-1.5B-Open-R1-GRPO
# hub_strategy: every_save
learning_rate: 2.0e-05
log_completions: true
log_level: info
logging_first_step: false
logging_steps: 100
logging_strategy: steps
lr_scheduler_type: cosine
max_prompt_length: 512
max_completion_length: 1024
max_steps: -1
num_generations: 7
num_train_epochs: 1
output_dir: data/Qwen2.5-1.5B-Open-R1-GRPO
overwrite_output_dir: true
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
push_to_hub: false
report_to:
- none
reward_funcs:
- accuracy
- format
reward_weights:
- 1.0
- 1.0
save_strategy: "epoch"
save_total_limit: 1
seed: 42
warmup_ratio: 0.1

with vllm_enable_prefix_cache: true (default):
image

with vllm_enable_prefix_cache: false:

image

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @qgallouedec

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Can you just please update the doc and we're good to merge

@ji-huazhong
Copy link
Contributor Author

@qgallouedec Made the suggested change. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when using use_vllm=True with GRPOTrainer on V100 GPUs
2 participants