-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add "_prepare_fsdp" for DPOTrainer #2539
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the fix @faaany - overall it looks great!
Would you mind confirming that the following demo command works with your PR (once activation checkpointing is removed):
accelerate launch --config_file=examples/accelerate_configs/fsdp_qlora.yaml --num_processes=NUM_GPUS trl/scripts/dpo.py trl/scripts/dpo.py \
--dataset_name trl-lib/ultrafeedback_binarized \
--model_name_or_path Qwen/Qwen2-0.5B-Instruct \
--learning_rate 5.0e-7 \
--num_train_epochs 1 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--gradient_checkpointing \
--logging_steps 25 \
--eval_strategy steps \
--eval_steps 50 \
--output_dir Qwen2-0.5B-DPO \
--no_remove_unused_columns
If it runs without error, can you please rename fsdp_qlora.yaml
to fsdp.yaml
so it runs for both modes?
A question for @qgallouedec: should this helper function live in a utils
module somewhere so we don't have to copy it around to all other trainers?
I tried running the demo command without qlora, and got the following error: @faaany, I am wondering if you were able to replicate or fix this. I am attaching the trainer code for reference. |
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
both mode work. |
does it work in other distributed mode, e.g. deepspeed? |
I think it would make sense to have it in |
I have not tried other modes, but I am observing this problem for fsdp. |
I can't reproduce your issue. But from my experience, this is likely not related to FSDP, but your environment set-up. I am using transformers==4.47.1 and accelerate==1.2.1. Could you try a different transformer's version just like this post says? |
@qgallouedec @lewtun how about deepspeed? should we use the |
@faaany: My version of transformers and accelerate is same as yours . What version of torch are you using ? |
I am using torch 2.5.1+cu121. |
Hi folks, can we merge this PR or should I move this method to |
Hey, can you move this function outside DPO? In model utils instead |
Thanks, code updated. |
@faaany I have the same setup as yours and I still get the error. I had included the trainer file fsdp_dpo_trainer.txt in my first message can you verify if the implementation is indeed correct ? |
can you provide me with detailed steps to reproduce your issue? |
Hi @qgallouedec @lewtun , I saw "prepare_deepspeed" got moved to "utils.py". Can we proceed with this PR as well? Thanks a lot! |
Below is the result I got on CUDA: {'train_runtime': 24.6598, 'train_samples_per_second': 4.055, 'train_steps_per_second': 0.122, 'train_loss': 0.6775782903035482, 'epoch': 0.96}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:24<00:00, 8.22s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00, 1.92it/s]
***** eval metrics *****
epoch = 0.96
eval_logits/chosen = -2.9692
eval_logits/rejected = -3.0303
eval_logps/chosen = -404.6238
eval_logps/rejected = -346.6078
eval_loss = 0.694
eval_rewards/accuracies = 0.4464
eval_rewards/chosen = 0.0088
eval_rewards/margins = 0.017
eval_rewards/rejected = -0.0082
eval_runtime = 0:00:04.09
eval_samples_per_second = 24.424
eval_steps_per_second = 1.71 |
What does this PR do?
While training with DPOTrainer using FSDP and accelerate, I got the same error as mentioned in #1147. Similar to "_prepare_deepspeed", I fixed the issue by adding a new method called "_prepare_fsdp".