Skip to content

Commit

Permalink
Add more doc
Browse files Browse the repository at this point in the history
  • Loading branch information
lewtun committed Nov 9, 2023
1 parent 3a54302 commit 33a0ce3
Show file tree
Hide file tree
Showing 9 changed files with 54 additions and 33 deletions.
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -159,5 +159,6 @@ cython_debug/
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Temp checkpoint folder
data/
# Temp folders
data/
wandb/
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ To run the code in this project, first create a Python virtual environment using
conda create -n handbook python=3.10 && conda activate handbook
```

Next, install PyTorch `v2.0.1` - the precise version is important for reproducibility! Since this hardware-dependent, we
direct you to the [PyTorch Installation Page](https://pytorch.org/get-started/previous-versions/#v201).
Next, install PyTorch `v2.1.0` - the precise version is important for reproducibility! Since this hardware-dependent, we
direct you to the [PyTorch Installation Page](https://pytorch.org/get-started/locally/).

You can then install the remaining package dependencies as follows:

Expand All @@ -44,7 +44,7 @@ python -m pip install .
You will also need Flash Attention 2 installed, which can be done by running:

```shell
python -m pip install flash-attn==2.3.0 --no-build-isolation
python -m pip install flash-attn --no-build-isolation
```

Next, log into your Hugging Face account as follows:
Expand Down
4 changes: 2 additions & 2 deletions recipes/launch.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
#SBATCH --exclusive
#SBATCH --gres=gpu:8
#SBATCH --partition=production-cluster
#SBATCH --output=/fsx/h4/logs/%x-%j.out
#SBATCH --err=/fsx/h4/logs/%x-%j.err
#SBATCH --output=/fsx/h4/logs/%x-%j.out # Adjust this to your cluster
#SBATCH --err=/fsx/h4/logs/%x-%j.err # Adjust this to your cluster

set -x -e

Expand Down
3 changes: 3 additions & 0 deletions recipes/zephyr-7b/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

# Instructions

In the handbook, for each training step we provide two sets of recipes:
- Full training on a multi-GPU machine (tested on a 8xA100 node), using slurm to queue jobs.
- LORA taining on a single consumer 24GB GPU (tested on a RTX 4090)
Expand All @@ -21,6 +22,7 @@ sbatch --job-name=handbook_sft --nodes=1 recipes/launch.slurm zephyr-7b sft full
```

## LORA training examples

### SFT
```shell
# locally on 1 gpu
Expand All @@ -33,6 +35,7 @@ sbatch --job-name=handbook_sft_lora --nodes=1 recipes/launch.slurm zephyr-7b sft
```

### SFT

```shell
# locally on 1 gpu
accelerate launch scripts/run_dpo.py recipes/zephyr-7b/dpo/config_lora.yaml
Expand Down
1 change: 0 additions & 1 deletion recipes/zephyr-7b/sft/config_lora.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Model arguments
model_name_or_path: mistralai/Mistral-7B-v0.1
model_revision: main
torch_dtype: auto
use_flash_attention_2: true

Expand Down
60 changes: 40 additions & 20 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,58 @@

## Supervised Fine-Tuning (SFT)
## Scripts to Train and Evaluate Chat Models

We provide 3 main ways to train SFT models:
### Fine-tuning

* Distributed fine-tuning of all model weights with ZeRO-3
* Fine-tuning with LoRA adapters and ZeRO-3
* Fine-tuning with QLoRA adapters and DDP
In the handbook, we provide two main ways to align LLMs for chat:

- Full fine-tuning on a multi-GPU machine (tested on an 8 x A100 (80GB) node).
- LoRA fine-tuning on a single consumer 24GB GPU (tested on a RTX 4090).

In practice, we find comparable performance for both full and LoRA fine-tuning, with the latter having the advantage of producing small adapter weights that are fast to upload and download from the Hugging Face Hub. Here's the two general commands to fine-tune your models:

```shell
# Full training with ZeRO-3
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/{model_name}/sft/config_full.yaml
# Full training with ZeRO-3 on 8 GPUs
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_{task}.py recipes/{model_name}/{task}/config_full.yaml

# LoRA training on single GPU
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_{task}.py recipes/{model_name}/{task}/config_lora.yaml
```

# LoRA training with ZeRO-3
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/{model_name}/sft/config_16bit.yaml
Here `{task}` refers to type of training you wish to run (SFT, DPO, etc), while `{model_name}` refers to the choice of recipe in the `recipes/` directory. For example, to replicate Zephyr 7B you can run:

# QLoRA training with DDP
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml scripts/run_sft.py recipes/{model_name}/sft/config_8bit.yaml
```shell
# Step 1 - train SFT policy
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/zephyr-7b/sft/config_full.yaml

# Step 2 - align with DPO
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/zephyr-7b/dpo/config_full.yaml
```

You can override the parameters in each YAML config by appending them to the command as follows:

```shell
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/{model_name}/sft/config_full.yaml --per_device_train_batch_size=2 --num_train_epochs=3
# Change batch size, number of epochs etc
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_{task}.py recipes/{model_name}/{task}/config_full.yaml --per_device_train_batch_size=42 --num_train_epochs=5
```

## Direct Preference Optimisation (DPO)
By default all training metrics are logged with TensorBoard. If you have a [Weights and Biases](https://wandb.ai/site) account and are logged in, you can view the training metrics by appending `--report_to=wandb`, e.g.

```shell
# Full training with ZeRO-3
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/{model_name}/dpo/config_full.yaml
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_{task}.py recipes/{model_name}/{task}/config_full.yaml --report_to=wandb
```

#### Launching jobs on a Slurm cluster

# LoRA training with ZeRO-3
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/{model_name}/dpo/config_16bit.yaml
If you have access to a Slurm cluster, we provide a `recipes/launch.slurm` script that will automatically queue training jobs for you. Here's how you can use it:

```shell
sbatch --job-name=handbook_{task} --nodes=1 recipes/launch.slurm {model_name} {task} {precision} {accelerator}
```

Here `{model_name}` and `{task}` are defined as above, while `{precision}` refers to the type of training (full vs LoRA) and `{accelerator}` refers to the choice of 🤗 Accelerate config in `recipes/accelerate_configs`. Here's a concrete example to run SFT on 1 node of 8 GPUs:

```shell
sbatch --job-name=handbook_sft --nodes=1 recipes/launch.slurm zephyr-7b sft full deepspeed_zero3
```

# QLoRA training with DDP
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml scripts/run_dpo.py recipes/{model_name}/dpo/config_8bit.yaml
```
**Note:** the configuration in `recipes/launch.slurm` is optimised for the Hugging Face Compute Cluster and may require tweaking to be adapted to your own compute nodes.
1 change: 1 addition & 0 deletions scripts/run_dpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ def main():
kwargs = {
"finetuned_from": model_args.model_name_or_path,
"dataset": list(data_args.dataset_mixer.keys()),
"dataset_tags": list(data_args.dataset_mixer.keys()),
"tags": ["alignment-handbook"],
}
dpo_trainer.create_model_card(**kwargs)
Expand Down
5 changes: 1 addition & 4 deletions scripts/run_sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,13 +82,9 @@ def main():
# Load datasets
###############
raw_datasets = get_datasets(data_args, splits=data_args.dataset_splits)

logger.info(
f"Training on the following datasets and their proportions: {[split + ' : ' + str(dset.num_rows) for split, dset in raw_datasets.items()]}"
)
with training_args.main_process_first(desc="Log a few random samples from the raw training set"):
for index in random.sample(range(len(raw_datasets["train"])), 3):
logger.info(f"Sample {index} of the raw training set:\n\n{raw_datasets['train'][index]['messages']}")

################
# Load tokenizer
Expand Down Expand Up @@ -176,6 +172,7 @@ def main():
kwargs = {
"finetuned_from": model_args.model_name_or_path,
"dataset": list(data_args.dataset_mixer.keys()),
"dataset_tags": list(data_args.dataset_mixer.keys()),
"tags": ["alignment-handbook"],
}
trainer.create_model_card(**kwargs)
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
"scipy",
"tensorboard",
"transformers==4.35.0",
"trl==0.7.4", # TODO bump to next release, added for NEFTune
"trl==0.7.4",
"tqdm>=4.64.1",
]

Expand Down

0 comments on commit 33a0ce3

Please sign in to comment.