Skip to content

v1.1.0

Compare
Choose a tag to compare
@github-actions github-actions released this 18 Sep 22:50
· 169 commits to main since this release

What's new

Added 🎉

  • Added support for changing train sequence length when loading a checkpoint.
  • Added support for sequence length warm-up during training via the callback SequenceLengthSchedulerCallback.
  • Added support for variable sequence length (VSL) datasets and VSL curriculums as introduced in "Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum".
  • Added Lion and SkipStepLion optimizers.
  • Added init_seed argument to Transformer and TransformerConfig.

Changed ⚠️

  • Renamed MemMapDataset to NumpyFSLDataset.
  • Batch size is now specified in tokens, not instances.

Commits

3aa7a1b Allow configuring model init seed, make iterable dataset classes into data loaders (#49)
bf48f80 Big changes to dataset API, adds support for variable sequence length training (#48)
3f4bbd3 update configs for latest architecture (#47)
ecfd2d0 Ensure speed monitor can handle variable sequence length (#46)
12c629a Add a callback for sequence length warm-up scheduling (#45)
603952b Clean up optimizer API (#44)
5036d90 Add Lion optimizer and a "skip step" version of it (#43)
50857c4 split up optim module
cef56ea Rename MemMapDataset to NumpyDataset (#42)
6c0ccf3 Add support for sequence length warm-up (#41)
4b65602 minor clean up of distributed utils