Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(release): merge set of changes for v2.4.0 #439

Closed
wants to merge 8 commits into from
3 changes: 1 addition & 2 deletions .github/workflows/image.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,8 @@ jobs:
sudo swapoff -a
sudo rm -f /swapfile
sudo apt clean
docker rmi $(docker image ls -aq)
if [ "$(docker image ls -q)" ]; then docker rmi $(docker image ls -aq); fi
df -h
- name: Build image
run: |
docker build -t fms-hf-tuning:dev . -f build/Dockerfile

21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# FMS HF Tuning

- [Installation](#installation)
- [Data format](#data-format)
- [Data format support](#data-support)
- [Supported Models](#supported-models)
- [Training](#training)
- [Single GPU](#single-gpu)
Expand Down Expand Up @@ -62,13 +62,13 @@ pip install fms-hf-tuning[aim]
For more details on how to enable and use the trackers, Please see, [the experiment tracking section below](#experiment-tracking).

## Data Support
Users can pass training data in a single file using the `--training_data_path` argument along with other arguments required for various [use cases](#use-cases-supported-with-training_data_path-argument) (see details below) and the file can be in any of the [supported formats](#supported-data-formats). Alternatively, you can use our powerful [data preprocessing backend](./docs/advanced-data-preprocessing.md) to preprocess datasets on the fly.
Users can pass training data as either a single file or a Hugging Face dataset ID using the `--training_data_path` argument along with other arguments required for various [use cases](#use-cases-supported-with-training_data_path-argument) (see details below). If user choose to pass a file, it can be in any of the [supported formats](#supported-data-formats). Alternatively, you can use our powerful [data preprocessing backend](./docs/advanced-data-preprocessing.md) to preprocess datasets on the fly.


Below, we mention the list of supported data usecases via `--training_data_path` argument. For details of our advanced data preprocessing see more details in [Advanced Data Preprocessing](./docs/advanced-data-preprocessing.md).

## Supported Data Formats
We support the following data formats via `--training_data_path` argument
We support the following file formats via `--training_data_path` argument

Data Format | Tested Support
------------|---------------
Expand All @@ -77,6 +77,8 @@ JSONL | ✅
PARQUET | ✅
ARROW | ✅

As iterated above, we also support passing a HF dataset ID directly via `--training_data_path` argument.

## Use cases supported with `training_data_path` argument

### 1. Data formats with a single sequence and a specified response_template to use for masking on completion.
Expand Down Expand Up @@ -742,6 +744,8 @@ The list of configurations for various `fms_acceleration` plugins:
- [attention_and_distributed_packing](./tuning/config/acceleration_configs/attention_and_distributed_packing.py):
- `--padding_free`: technique to process multiple examples in single batch without adding padding tokens that waste compute.
- `--multipack`: technique for *multi-gpu training* to balance out number of tokens processed in each device, to minimize waiting time.
- [fast_moe_config](./tuning/config/acceleration_configs/fast_moe.py) (experimental):
- `--fast_moe`: trains MoE models in parallel, increasing throughput and decreasing memory usage.

Notes:
* `quantized_lora_config` requires that it be used along with LoRA tuning technique. See [LoRA tuning section](https://github.com/foundation-model-stack/fms-hf-tuning/tree/main?tab=readme-ov-file#lora-tuning-example) on the LoRA parameters to pass.
Expand All @@ -760,6 +764,17 @@ Notes:
* Notes on Multipack
- works only for *multi-gpu*.
- currently only includes the version of *multipack* optimized for linear attention implementations like *flash-attn*.
* Notes on Fast MoE
- `--fast_moe` is an integer value that configures the amount of expert parallel sharding (ep_degree).
- `world_size` must be divisible by the `ep_degree`
- Running fast moe modifies the state dict of the model, and must be post-processed using [checkpoint utils](https://github.com/foundation-model-stack/fms-acceleration/blob/main/plugins/accelerated-moe/src/fms_acceleration_moe/utils/checkpoint_utils.py) to run inference (HF, vLLM, etc.).
- The typical usecase for this script is to run:
```
python -m fms_acceleration_moe.utils.checkpoint_utils \
<checkpoint file> \
<output file> \
<original model>
```

Note: To pass the above flags via a JSON config, each of the flags expects the value to be a mixed type list, so the values must be a list. For example:
```json
Expand Down
8 changes: 8 additions & 0 deletions tests/acceleration/test_acceleration_dataclasses.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
MultiPack,
PaddingFree,
)
from tuning.config.acceleration_configs.fast_moe import FastMoe, FastMoeConfig
from tuning.config.acceleration_configs.fused_ops_and_kernels import (
FastKernelsConfig,
FusedLoraConfig,
Expand Down Expand Up @@ -88,6 +89,13 @@ def test_dataclass_parse_successfully():
)
assert isinstance(cfg.multipack, MultiPack)

# 5. Specifing "--fast_moe" will parse an FastMoe class
parser = transformers.HfArgumentParser(dataclass_types=FastMoeConfig)
(cfg,) = parser.parse_args_into_dataclasses(
["--fast_moe", "1"],
)
assert isinstance(cfg.fast_moe, FastMoe)


def test_two_dataclasses_parse_successfully_together():
"""Ensure that the two dataclasses can parse arguments successfully
Expand Down
160 changes: 157 additions & 3 deletions tests/acceleration/test_acceleration_framework.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
MultiPack,
PaddingFree,
)
from tuning.config.acceleration_configs.fast_moe import FastMoe, FastMoeConfig
from tuning.config.acceleration_configs.fused_ops_and_kernels import (
FastKernelsConfig,
FusedLoraConfig,
Expand All @@ -56,7 +57,8 @@
# for some reason the CI will raise an import error if we try to import
# these from tests.artifacts.testdata
TWITTER_COMPLAINTS_JSON_FORMAT = os.path.join(
os.path.dirname(__file__), "../artifacts/testdata/twitter_complaints_json.json"
os.path.dirname(__file__),
"../artifacts/testdata/json/twitter_complaints_small.json",
)
TWITTER_COMPLAINTS_TOKENIZED = os.path.join(
os.path.dirname(__file__),
Expand Down Expand Up @@ -87,6 +89,10 @@
# Third Party
from fms_acceleration_aadp import PaddingFreeAccelerationPlugin

if is_fms_accelerate_available(plugins="moe"):
# Third Party
from fms_acceleration_moe import ScatterMoEAccelerationPlugin


# There are more extensive unit tests in the
# https://github.com/foundation-model-stack/fms-acceleration
Expand Down Expand Up @@ -360,7 +366,7 @@ def test_framework_raises_due_to_invalid_arguments(
acceleration_configs_map,
ids=["bitsandbytes", "auto_gptq"],
)
def test_framework_intialized_properly_peft(
def test_framework_initialized_properly_peft(
quantized_lora_config, model_name_or_path, mock_and_spy
):
"""Ensure that specifying a properly configured acceleration dataclass
Expand Down Expand Up @@ -412,7 +418,7 @@ def test_framework_intialized_properly_peft(
"and foak plugins"
),
)
def test_framework_intialized_properly_foak():
def test_framework_initialized_properly_foak():
"""Ensure that specifying a properly configured acceleration dataclass
properly activates the framework plugin and runs the train sucessfully.
"""
Expand Down Expand Up @@ -477,6 +483,60 @@ def test_framework_intialized_properly_foak():
assert spy2["get_ready_for_train_calls"] == 1


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="moe"),
reason="Only runs if fms-accelerate is installed along with accelerated-moe plugin",
)
def test_framework_initialized_properly_moe():
"""Ensure that specifying a properly configured acceleration dataclass
properly activates the framework plugin and runs the train sucessfully.
"""

with tempfile.TemporaryDirectory() as tempdir:

model_args = copy.deepcopy(MODEL_ARGS)
model_args.model_name_or_path = "Isotonic/TinyMixtral-4x248M-MoE"
model_args.torch_dtype = torch.bfloat16
train_args = copy.deepcopy(TRAIN_ARGS)
train_args.output_dir = tempdir
train_args.save_strategy = "no"
train_args.bf16 = True
data_args = copy.deepcopy(DATA_ARGS)
data_args.training_data_path = TWITTER_COMPLAINTS_JSON_FORMAT
data_args.response_template = "\n\n### Label:"
data_args.dataset_text_field = "output"

# initialize a config
moe_config = FastMoeConfig(fast_moe=FastMoe(ep_degree=1))

# create mocked plugin class for spying
MockedPlugin1, spy = create_mock_plugin_class_and_spy(
"FastMoeMock", ScatterMoEAccelerationPlugin
)

# 1. mock a plugin class
# 2. register the mocked plugins
# 3. call sft_trainer.train
with build_framework_and_maybe_instantiate(
[
(["training.moe.scattermoe"], MockedPlugin1),
],
instantiate=False,
):
with instantiate_model_patcher():
sft_trainer.train(
model_args,
data_args,
train_args,
fast_moe_config=moe_config,
)

# spy inside the train to ensure that the ilab plugin is called
assert spy["model_loader_calls"] == 1
assert spy["augmentation_calls"] == 0
assert spy["get_ready_for_train_calls"] == 1


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="aadp"),
reason="Only runs if fms-accelerate is installed along with \
Expand Down Expand Up @@ -661,6 +721,100 @@ def test_error_raised_with_fused_lora_enabled_without_quantized_argument():
)


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="moe"),
reason="Only runs if fms-accelerate is installed along with accelerated-moe plugin",
)
def test_error_raised_with_undividable_fastmoe_argument():
"""
Ensure error is thrown when `--fast_moe` is passed and world_size
is not divisible by ep_degree
"""
with pytest.raises(
AssertionError, match="world size \\(1\\) not divisible by ep_size \\(3\\)"
):
with tempfile.TemporaryDirectory() as tempdir:

model_args = copy.deepcopy(MODEL_ARGS)
model_args.model_name_or_path = "Isotonic/TinyMixtral-4x248M-MoE"
model_args.torch_dtype = torch.bfloat16
train_args = copy.deepcopy(TRAIN_ARGS)
train_args.output_dir = tempdir
train_args.save_strategy = "no"
train_args.bf16 = True
data_args = copy.deepcopy(DATA_ARGS)
data_args.training_data_path = TWITTER_COMPLAINTS_JSON_FORMAT
data_args.response_template = "\n\n### Label:"
data_args.dataset_text_field = "output"

# initialize a config
moe_config = FastMoeConfig(fast_moe=FastMoe(ep_degree=3))

# 1. mock a plugin class
# 2. register the mocked plugins
# 3. call sft_trainer.train
with build_framework_and_maybe_instantiate(
[
(["training.moe.scattermoe"], ScatterMoEAccelerationPlugin),
],
instantiate=False,
):
with instantiate_model_patcher():
sft_trainer.train(
model_args,
data_args,
train_args,
fast_moe_config=moe_config,
)


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="moe"),
reason="Only runs if fms-accelerate is installed along with accelerated-moe plugin",
)
def test_error_raised_fast_moe_with_non_moe_model():
"""
Ensure error is thrown when `--fast_moe` is passed and model is not MoE
"""
with pytest.raises(
AttributeError,
match="'LlamaConfig' object has no attribute 'num_local_experts'",
):
with tempfile.TemporaryDirectory() as tempdir:

model_args = copy.deepcopy(MODEL_ARGS)
model_args.model_name_or_path = "TinyLlama/TinyLlama-1.1B-Chat-v0.3"
model_args.torch_dtype = torch.bfloat16
train_args = copy.deepcopy(TRAIN_ARGS)
train_args.output_dir = tempdir
train_args.save_strategy = "no"
train_args.bf16 = True
data_args = copy.deepcopy(DATA_ARGS)
data_args.training_data_path = TWITTER_COMPLAINTS_JSON_FORMAT
data_args.response_template = "\n\n### Label:"
data_args.dataset_text_field = "output"

# initialize a config
moe_config = FastMoeConfig(fast_moe=FastMoe(ep_degree=1))

# 1. mock a plugin class
# 2. register the mocked plugins
# 3. call sft_trainer.train
with build_framework_and_maybe_instantiate(
[
(["training.moe.scattermoe"], ScatterMoEAccelerationPlugin),
],
instantiate=False,
):
with instantiate_model_patcher():
sft_trainer.train(
model_args,
data_args,
train_args,
fast_moe_config=moe_config,
)


@pytest.mark.skipif(
not is_fms_accelerate_available(plugins="foak"),
reason="Only runs if fms-accelerate is installed along with \
Expand Down
Loading
Loading