parameterize enable_prefix_caching #2900

ji-huazhong · 2025-02-19T01:44:32Z

What does this PR do?

Tested using the configuration (for functional verification only) of open-r1 with minor modifications:

# Model arguments
model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct
model_revision: main
torch_dtype: float16
attn_implementation: eager

# Data training arguments
dataset_name: open-r1/OpenR1-Math-220k
dataset_configs:
- default
system_prompt: "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>"

# GRPO trainer config
bf16: true
use_vllm: true
vllm_device: auto
vllm_dtype: half
vllm_gpu_memory_utilization: 0.7
vllm_enable_prefix_caching: true
do_eval: false
gradient_accumulation_steps: 4
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
# hub_model_id: Qwen2.5-1.5B-Open-R1-GRPO
# hub_strategy: every_save
learning_rate: 2.0e-05
log_completions: true
log_level: info
logging_first_step: false
logging_steps: 100
logging_strategy: steps
lr_scheduler_type: cosine
max_prompt_length: 512
max_completion_length: 1024
max_steps: -1
num_generations: 7
num_train_epochs: 1
output_dir: data/Qwen2.5-1.5B-Open-R1-GRPO
overwrite_output_dir: true
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
push_to_hub: false
report_to:
- none
reward_funcs:
- accuracy
- format
reward_weights:
- 1.0
- 1.0
save_strategy: "epoch"
save_total_limit: 1
seed: 42
warmup_ratio: 0.1

with vllm_enable_prefix_cache: true (default):

with vllm_enable_prefix_cache: false:

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @qgallouedec

trl/trainer/grpo_config.py

qgallouedec

Thanks for the contribution. Can you just please update the doc and we're good to merge

ji-huazhong · 2025-02-19T11:27:53Z

@qgallouedec Made the suggested change. :)

parameterize enable_prefix_caching

47f1e60

qgallouedec mentioned this pull request Feb 19, 2025

[GRPO] Disable prefix cache for models with sliding window #2866

Closed

5 tasks

qgallouedec reviewed Feb 19, 2025

View reviewed changes

trl/trainer/grpo_config.py Outdated Show resolved Hide resolved

qgallouedec reviewed Feb 19, 2025

View reviewed changes

ji-huazhong force-pushed the issue-2798 branch from 6492607 to c7d8a26 Compare February 19, 2025 11:20

apply review suggestion

416908b

ji-huazhong force-pushed the issue-2798 branch from c7d8a26 to 416908b Compare February 19, 2025 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parameterize enable_prefix_caching #2900

parameterize enable_prefix_caching #2900

ji-huazhong commented Feb 19, 2025 •

edited

Loading

qgallouedec left a comment

ji-huazhong commented Feb 19, 2025

parameterize enable_prefix_caching #2900

Are you sure you want to change the base?

parameterize enable_prefix_caching #2900

Conversation

ji-huazhong commented Feb 19, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

qgallouedec left a comment

Choose a reason for hiding this comment

ji-huazhong commented Feb 19, 2025

ji-huazhong commented Feb 19, 2025 •

edited

Loading