Skip to content

Commit

Permalink
🖊 Fix typos (#2673)
Browse files Browse the repository at this point in the history
* fix typos

* fix typo

* fix typo

* fix typos

* fix typos

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo
  • Loading branch information
omahs authored Jan 28, 2025
1 parent 1123bd0 commit 4659ad9
Show file tree
Hide file tree
Showing 15 changed files with 23 additions and 23 deletions.
4 changes: 2 additions & 2 deletions docs/source/bco_trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ embedding_model = Accelerator().prepare_model(self.embedding_model)
embedding_func = partial(embed_prompt, model=embedding_model)
```

Set `prompt_sample_size` to defined how many prompts are selected to train the UDM classifier and start the training with the provided embedding function:
Set `prompt_sample_size` to define how many prompts are selected to train the UDM classifier and start the training with the provided embedding function:

```py
training_args = BCOConfig(
Expand Down Expand Up @@ -97,4 +97,4 @@ To scale how much the auxiliary loss contributes to the total loss, use the hype

## BCOConfig

[[autodoc]] BCOConfig
[[autodoc]] BCOConfig
6 changes: 3 additions & 3 deletions docs/source/ddpo_trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
## Getting started with Stable Diffusion finetuning with reinforcement learning

The machinery for finetuning of Stable Diffusion models with reinforcement learning makes heavy use of HuggingFace's `diffusers`
library. A reason for stating this is that getting started requires a bit of familiarity with the `diffusers` library concepts, mainly two of them - pipelines and schedulers.
Right out of the box (`diffusers` library), there isn't a `Pipeline` nor a `Scheduler` instance that is suitable for finetuning with reinforcement learning. Some adjustments need to made.
library. A reason for stating this is that getting started requires a bit of familiarity with the `diffusers` library concepts, mainly two of them - pipelines and schedulers.
Right out of the box (`diffusers` library), there isn't a `Pipeline` nor a `Scheduler` instance that is suitable for finetuning with reinforcement learning. Some adjustments need to be made.

There is a pipeline interface that is provided by this library that is required to be implemented to be used with the `DDPOTrainer`, which is the main machinery for fine-tuning Stable Diffusion with reinforcement learning. **Note: Only the StableDiffusion architecture is supported at this point.**
There is a default implementation of this interface that you can use out of the box. Assuming the default implementation is sufficient and/or to get things moving, refer to the training example alongside this guide.
Expand All @@ -26,7 +26,7 @@ For a more detailed look into the interface and the associated default implement

Note that the default implementation has a LoRA implementation path and a non-LoRA based implementation path. The LoRA flag enabled by default and this can be turned off by passing in the flag to do so. LORA based training is faster and the LORA associated model hyperparameters responsible for model convergence aren't as finicky as non-LORA based training.

Also in addition, there is the expectation of providing a reward function and a prompt function. The reward function is used to evaluate the generated images and the prompt function is used to generate the prompts that are used to generate the images.
Also in addition, there is the expectation of providing a reward function and a prompt function. The reward function is used to evaluate the generated images and the prompt function is used to generate the prompts that are used to generate the images.

## Getting started with `examples/scripts/ddpo.py`

Expand Down
6 changes: 3 additions & 3 deletions docs/source/detoxifying_a_lm.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ When doing PPO, it is very important to design the problem efficiently so that t

### Pre-processing the dataset

The dataset consist of prompts and their continuations, and each of them has an associated `toxicity` score.
The dataset consists of prompts and their continuations, and each of them has an associated `toxicity` score.

A `prompt` example:
```
Expand Down Expand Up @@ -109,7 +109,7 @@ ref_model = create_reference_model(model, num_shared_layers=6)
trainer = PPOTrainer(..., ref_model=ref_model)
```

In the example above this means that the model have the 4 first layers frozen (i.e. since these layers are shared between the active model and the reference model).
In the example above this means that the model has the 4 first layers frozen (i.e. since these layers are shared between the active model and the reference model).

- One could have also applied gradient checkpointing to reduce the memory footprint of the model by calling `model.pretrained_model.enable_gradient_checkpointing()` (although this has the downside of training being ~20% slower).

Expand Down Expand Up @@ -176,7 +176,7 @@ The evaluation script can be found [here](https://github.com/huggingface/trl/blo

The results are quite promising, as we can see that the models are able to reduce the toxicity score of the generated text by an interesting margin. The gap is clear for `gpt-neo-2B` model but we less so for the `gpt-j-6B` model. There are several things we could try to improve the results on the largest model starting with training with larger `mini_batch_size` and probably allowing to back-propagate through more layers (i.e. use less shared layers).

To sum up, in addition to human feedback this could be a useful additional signal when training large language models to ensure there outputs are less toxic as well as useful.
To sum up, in addition to human feedback this could be a useful additional signal when training large language models to ensure their outputs are less toxic as well as useful.

### Limitations

Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/hh-rlhf-helpful-base.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def extract_dialogue(example: str) -> list[dict[str, str]]:
- **Type**: [Preference](https://huggingface.co/docs/trl/main/dataset_formats#preference)
Columns:
- `"pompt"`: The user query.
- `"prompt"`: The user query.
- `"chosen"`: A response deemed helpful by human evaluators.
- `"rejected"`: A response considered less helpful or unhelpful.
Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/lm-human-preferences-descriptiveness.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ def to_prompt_completion(example, tokenizer):
- **Type**: [Preference](https://huggingface.co/docs/trl/main/dataset_formats#preference)
Columns:
- `"pompt"`: The text sample.
- `"prompt"`: The text sample.
- `"chosen"`: A version of the text with enhanced descriptiveness.
- `"rejected"`: A version of the text with less descriptiveness.
Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/lm-human-preferences-sentiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def to_prompt_completion(example, tokenizer):
- **Type**: [Preference](https://huggingface.co/docs/trl/main/dataset_formats#preference)
Columns:
- `"pompt"`: The text sample.
- `"prompt"`: The text sample.
- `"chosen"`: A version of the text that conveys the desired sentiment.
- `"rejected"`: A version of the text that does not convey the desired sentiment.
Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/math_shepherd.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def process_example(example):
- **Type**: [Stepwise supervision](https://huggingface.co/docs/trl/main/dataset_formats#stepwise-supervision)
Columns:
- `"pompt"`: The problem statement.
- `"prompt"`: The problem statement.
- `"completions"`: A list of reasoning steps generated to solve the problem.
- `"labels"`: A list of booleans or floats indicating the correctness of each corresponding reasoning step.
Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/prm800k.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def process_batch(examples):
- **Type**: [Stepwise supervision](https://huggingface.co/docs/trl/main/dataset_formats#stepwise-supervision)
Columns:
- `"pompt"`: The problem statement.
- `"prompt"`: The problem statement.
- `"completions"`: A list of reasoning steps generated to solve the problem.
- `"labels"`: A list of booleans or floats indicating the correctness of each corresponding reasoning step.
Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/rlaif-v.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def to_conversational(example):
- **Type**: [Preference](https://huggingface.co/docs/trl/main/dataset_formats#preference)
Columns:
- `"pompt"`: The task related to the image.
- `"prompt"`: The task related to the image.
- `"images"`: The image.
- `"chosen"`: The preferred answer.
- `"rejected"`: An alternative answer that was not preferred.
Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/tldr.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def to_prompt_completion(example):
- **Type**: [Prompt-completion](https://huggingface.co/docs/trl/main/dataset_formats#prompt-completion)
Columns:
- `"pompt"`: The unabridged Reddit post.
- `"prompt"`: The unabridged Reddit post.
- `"completion"`: The concise "TL;DR" summary appended by the author.
This structure enables models to learn the relationship between detailed content and its abbreviated form, enhancing their summarization capabilities.
Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/tldr_preference.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def to_preference(example):
- **Type**: [Preference](https://huggingface.co/docs/trl/main/dataset_formats#preference)
Columns:
- `"pompt"`: The unabridged Reddit post.
- `"prompt"`: The unabridged Reddit post.
- `"chosen"`: The concise "TL;DR" summary appended by the author.
- `"rejected"`: An alternative summary or response that was not selected.
Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/ultrafeedback-prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def drop_long_prompt(example):
- **Type**: [Prompt-only](https://huggingface.co/docs/trl/main/dataset_formats#prompt-only)
Column:
- `"pompt"`: The input question or instruction provided to the model.
- `"prompt"`: The input question or instruction provided to the model.
## Generation script
Expand Down
2 changes: 1 addition & 1 deletion examples/datasets/ultrafeedback.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ def to_unpaired_preference(example, model_name, aspect):
- **Type**: [Unpaired preference](https://huggingface.co/docs/trl/main/dataset_formats#unpaired-preference)
Column:
- `"pompt"`: The input question or instruction provided to the model.
- `"prompt"`: The input question or instruction provided to the model.
- `"completion"`: The model's response to the prompt.
- `"label"`: A binary value indicating whether the response is sufficiently helpful.
Expand Down
8 changes: 4 additions & 4 deletions tests/test_data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ class IsConversationalTester(unittest.TestCase):
{ # Prompt only
"prompt": [{"role": "user", "content": "What color is the sky?"}],
},
{ # Pompt-completion
{ # Prompt-completion
"prompt": [{"role": "user", "content": "What color is the sky?"}],
"completion": [{"role": "assistant", "content": "It is blue."}],
},
Expand Down Expand Up @@ -110,7 +110,7 @@ class ApplyChatTemplateTester(unittest.TestCase):
{ # Prompt only
"prompt": [{"role": "user", "content": "What color is the sky?"}],
},
{ # Pompt-completion
{ # Prompt-completion
"prompt": [{"role": "user", "content": "What color is the sky?"}],
"completion": [{"role": "assistant", "content": "It is blue."}],
},
Expand Down Expand Up @@ -153,7 +153,7 @@ def test_apply_chat_template(self, tokenizer_id, example):
# Checking if the result is a dictionary
self.assertIsInstance(result, dict)

# The chat template should be applied to the the following keys
# The chat template should be applied to the following keys
for key in ["prompt", "chosen", "rejected", "completion"]:
if key in example:
self.assertIn(key, result)
Expand All @@ -179,7 +179,7 @@ def test_maybe_apply_chat_template(self, tokenizer_id, example):
# Checking if the result is a dictionary
self.assertIsInstance(result, dict)

# The chat template should be applied to the the following keys
# The chat template should be applied to the following keys
for key in ["prompt", "chosen", "rejected", "completion"]:
if key in example:
self.assertIn(key, result)
Expand Down
2 changes: 1 addition & 1 deletion trl/trainer/gkd_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def __init__(
peft_config: Optional["PeftConfig"] = None,
formatting_func: Optional[Callable] = None,
):
# add remove_unused_columns=False to the the dataclass args
# add remove_unused_columns=False to the dataclass args
args.remove_unused_columns = False
data_collator = DataCollatorForChatML(tokenizer=processing_class, max_length=args.max_seq_length)

Expand Down

0 comments on commit 4659ad9

Please sign in to comment.