SFTTrainer training very slow on GPU. Is this training speed expected? #2378
Unanswered
pledominykas
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am currently trying to perform full fine tuning on the ai-forever/mGPT model (1.3B parameters) using a single A100 GPU (40GB VRAM) on Google Colab. However when running the training is very slow: ~0.06 it/s.
Here is my code:
`
dataset = load_dataset("allenai/c4", "lt")
train_dataset = dataset["train"]
eval_dataset = dataset["validation"]
train_dataset = train_dataset.take(10000)
eval_dataset = eval_dataset.take(1000)
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = train_dataset,
eval_dataset = eval_dataset,
dataset_text_field = "text",
max_seq_length = 2048,
)
trainer_stats = trainer.train()
`
And the trainer output:
![image](https://private-user-images.githubusercontent.com/67858373/388631550-8e822509-3366-4e05-9b97-99f7443a8187.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MDMwMjAsIm5iZiI6MTczODkwMjcyMCwicGF0aCI6Ii82Nzg1ODM3My8zODg2MzE1NTAtOGU4MjI1MDktMzM2Ni00ZTA1LTliOTctOTlmNzQ0M2E4MTg3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDA0MzIwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZjZTJlYTk1YjUwNWYxZjlmZGJiYzg4NGVmZWZhYzgxZTdiNDM2OTg3MjQwY2VmZjRjN2NjNTNlYmNiYjY5YTcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.lbqxrghnI94ie_gupHhHRVAZeR7mOgBXegXP3w9zLjs)
It says it will take ~10hrs to process 10k examples from the c4 dataset.
These are the relevant package versions and a screenshot of GPU usage:
`
Package Version
accelerate 0.34.2
![image](https://private-user-images.githubusercontent.com/67858373/388631812-39911cfc-14aa-437f-9a90-52ca47e29ba0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MDMwMjAsIm5iZiI6MTczODkwMjcyMCwicGF0aCI6Ii82Nzg1ODM3My8zODg2MzE4MTItMzk5MTFjZmMtMTRhYS00MzdmLTlhOTAtNTJjYTQ3ZTI5YmEwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDA0MzIwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTM0ZDcwYmU0NjJlMzdiNDMwOWI0ZmVhYWVjZWNlZjI2ZTg0MDRmYTE4OWQxZjNkODQxYTFkYzA0YTgxYzQyODImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.SrNxx7j7VM6EdJtmJXM7XL63znyJu34ZkrT9eF1bb4M)
bitsandbytes 0.44.1
datasets 3.1.0
peft 0.13.2
torch 2.5.0+cu121
trl 0.12.0
`
It does seem to load the model to the GPU, but for some reason it’s still very slow.
I tried to use keep_in_memory=True when loading the dataset, but it did not help.
I also tried pre-tokenizing the dataset and using Trainer instead of SFTTrainer but the performance was similar.
I was wondering whether this is the expected training speed or is there some issue with my code? And if it is an issue, what could a possible fix be?
Beta Was this translation helpful? Give feedback.
All reactions