v1.1.0
What's new
Added 🎉
- Added support for changing train sequence length when loading a checkpoint.
- Added support for sequence length warm-up during training via the callback
SequenceLengthSchedulerCallback
. - Added support for variable sequence length (VSL) datasets and VSL curriculums as introduced in "Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum".
- Added
Lion
andSkipStepLion
optimizers. - Added
init_seed
argument toTransformer
andTransformerConfig
.
Changed ⚠️
- Renamed
MemMapDataset
toNumpyFSLDataset
. - Batch size is now specified in tokens, not instances.
Commits
3aa7a1b Allow configuring model init seed, make iterable dataset classes into data loaders (#49)
bf48f80 Big changes to dataset API, adds support for variable sequence length training (#48)
3f4bbd3 update configs for latest architecture (#47)
ecfd2d0 Ensure speed monitor can handle variable sequence length (#46)
12c629a Add a callback for sequence length warm-up scheduling (#45)
603952b Clean up optimizer API (#44)
5036d90 Add Lion optimizer and a "skip step" version of it (#43)
50857c4 split up optim module
cef56ea Rename MemMapDataset
to NumpyDataset
(#42)
6c0ccf3 Add support for sequence length warm-up (#41)
4b65602 minor clean up of distributed utils