Skip to content

Releases: allenai/OLMo-core

v1.3.0

26 Sep 16:05
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added torchao to the Docker/Beaker images.
  • Added support for torchao float8 training via the Float8HandlerCallback.
  • Added Callback.post_attach() method.

Commits

8c03ca8 Add support for float8 training via torchao (#54)
2f253f8 Minor updates to precision settings for official configs, add torchao to Docker/Beaker image (#53)
7e3ddd4 increase batch size for 13B

v1.2.0

25 Sep 17:28
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added support for wildcards in OptimGroupOverride.params.
  • Added NumpyPaddedFSLDataset variant.
  • Added Evaluator class and EvaluatorCallback for in-loop evals.
  • Added v3-small-ppl-validation data mix.

Fixed ✅

  • Fixed bug with data loader when using threading.

Commits

346d7d1 Merge more internal config components
6b156c2 Add v3-small-ppl-validation mix (#52)
10a6ff0 add citation info
b131376 Add in-loop evals (#51)
b81d670 Use Beaker image ID instead of full name
497b756 Add support for wildcards in OptimGroupOverride.params (#50)

v1.1.0

18 Sep 22:50
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added support for changing train sequence length when loading a checkpoint.
  • Added support for sequence length warm-up during training via the callback SequenceLengthSchedulerCallback.
  • Added support for variable sequence length (VSL) datasets and VSL curriculums as introduced in "Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum".
  • Added Lion and SkipStepLion optimizers.
  • Added init_seed argument to Transformer and TransformerConfig.

Changed ⚠️

  • Renamed MemMapDataset to NumpyFSLDataset.
  • Batch size is now specified in tokens, not instances.

Commits

3aa7a1b Allow configuring model init seed, make iterable dataset classes into data loaders (#49)
bf48f80 Big changes to dataset API, adds support for variable sequence length training (#48)
3f4bbd3 update configs for latest architecture (#47)
ecfd2d0 Ensure speed monitor can handle variable sequence length (#46)
12c629a Add a callback for sequence length warm-up scheduling (#45)
603952b Clean up optimizer API (#44)
5036d90 Add Lion optimizer and a "skip step" version of it (#43)
50857c4 split up optim module
cef56ea Rename MemMapDataset to NumpyDataset (#42)
6c0ccf3 Add support for sequence length warm-up (#41)
4b65602 minor clean up of distributed utils

v1.0.6

05 Sep 16:29
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added "selected_modules" transformer activation checkpointing mode.
  • Added OLMo-1B.py official training script.
  • Added OLMo-13B.py official training script.
  • Added Trainer.get_metric(), .get_loss(), and .get_zloss() methods.
  • Added io.copy_file() function.
  • Added ProfilerCallback for profiling/tracing the training loop with PyTorch profiler module.
  • Added an "L2 norm" metric reduce type.

Fixed ✅

  • Made reducing metrics more numerically stable with large world sizes.

Commits

c3664ae remove barrier from iterable dataset
073cf0d Add an "L2 norm" metric reduce type
8623a4a Add a PyTorch profiler callback (#40)
78caf74 Add more to docs
c2be557 Add official 13B training script (#39)
d59726b Fix typo in OLMo-1B.py
80b1439 Add links to official training scripts
868b5eb Add official 1B training script (#38)
5bf1731 Add selected_modules transformer activation checkpointing mode (#37)

v1.0.5

03 Sep 16:09
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed bug with checkpointer callback searching for existing ephemeral checkpoints when the checkpoint folder doesn't exist.
  • Checkpointer callback won't collect existing ephemeral checkpoints that were saved after the checkpoint that was loaded from.

Commits

69dec9b don't remove newer checkpoints
9d03cf9 Fix bug with checkpointer callback
8e4c9e2 guard against duplicate tags

v1.0.4

02 Sep 02:11
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added Trainer.save_checkpoint() and Trainer.save_checkpoint_async() methods.
  • Added Callback.post_checkpoint_saved() and Callback.post_checkpoint_loaded() methods.
  • Added ConfigSaverCallback.
  • Added MemMapDataset.fingerprint property.

Changed ⚠️

  • The work_dir argument to TrainerConfig now defaults to save_folder is save_folder is a local path, otherwise a temporary directory with the same name as the basename of the save_folder.
  • The seed argument to prepare_training_environment() is now optional.

Fixed ✅

  • Fixed setting the right env vars for single node training on Jupiter.

Commits

da3dfa6 Better dataset fingerprint checking, refactor activation checkpointing (#36)
b81664d Add CI jobs for checking docs build
1d237dc Add to trainer and callback API (#35)
6569131 clean up train/launch scripts
c43039e Update examples and launch scripts (#34)
b06a96f actually fix link
b2a8130 for URL for link

v1.0.3

30 Aug 19:46
Compare
Choose a tag to compare

What's new

Added 🎉

  • Add Trainer.hard_stop field.
  • The trainer now catches SIGTERM and marks the run as canceled.
  • Added CheckpointerCallback.remove strategy for configuring which old checkpoints found in the save folder are removed.
  • Added ReorderedNormTransformerBlock implementation.
  • Added WandBCallback.notes field.

Fixed ✅

  • Fixed bug with how command arguments were expanded by BeakerLaunchConfig.

Commits

6d76f89 Add overview section to docs
c009f89 Minor feature additions (#33)
535722c More trainer / launch improvements (#32)

v1.0.2

29 Aug 18:31
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added support for unsharding model state into safetensors format with olmo_core.distributed.checkpoint.unshard_checkpoint(..., use_safetensors=True).
  • Added data.TokenizerConfig config class and data.TokenizerName enumeration.
  • Added data mixes with data.DataMix API.
  • Added block_idx attribute to the TransformerBlock class.
  • Added init_method option to Transformer for controlling how the weights are initialized.

Fixed ✅

  • Fixed list_directory for remote folders.

Changed ⚠️

  • Callbacks now have to have a name assigned.

Commits

81f3588 Assign a unique name to each callback (#31)
35145b0 Fixes for remote checkpointing (#30)
3d53c65 Add init_method option to Transformer
65e21ac Allow customizing init func of transformer
6a5c5cf Add data mixes (#29)
b35283f add configuration class for tokenizer
e820017 Add warning about missing keys when loading
e254d41 Add project urls for PyPI
0e9e262 Update README.md
de50e54 Update image build pipeline (#28)
95d4cd9 Add support for unsharding model with safetensors

v1.0.1

27 Aug 04:52
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed a bug with resetting the initial LR in optimizers after a loading a checkpoint.

Commits

6e330ba Fix setting peak LR on restarts (#27)
4a9a653 changelog
34a9fc8 update favicon for new brand

v1.0.0

27 Aug 03:45
Compare
Choose a tag to compare

What's new

Commits

bbe4d58 The next generation of OLMo-core and OLMo training code (#26)
cda6f23 Remove in-house FSDP and ShardedFlatParameter (#25)
27f942f add release process notes
4a41fd9 add install instructions