Releases: allenai/OLMo-core
v1.3.0
What's new
Added 🎉
- Added
torchao
to the Docker/Beaker images. - Added support for
torchao
float8
training via theFloat8HandlerCallback
. - Added
Callback.post_attach()
method.
Commits
8c03ca8 Add support for float8 training via torchao (#54)
2f253f8 Minor updates to precision settings for official configs, add torchao to Docker/Beaker image (#53)
7e3ddd4 increase batch size for 13B
v1.2.0
What's new
Added 🎉
- Added support for wildcards in
OptimGroupOverride.params
. - Added
NumpyPaddedFSLDataset
variant. - Added
Evaluator
class andEvaluatorCallback
for in-loop evals. - Added
v3-small-ppl-validation
data mix.
Fixed ✅
- Fixed bug with data loader when using threading.
Commits
346d7d1 Merge more internal config components
6b156c2 Add v3-small-ppl-validation mix (#52)
10a6ff0 add citation info
b131376 Add in-loop evals (#51)
b81d670 Use Beaker image ID instead of full name
497b756 Add support for wildcards in OptimGroupOverride.params
(#50)
v1.1.0
What's new
Added 🎉
- Added support for changing train sequence length when loading a checkpoint.
- Added support for sequence length warm-up during training via the callback
SequenceLengthSchedulerCallback
. - Added support for variable sequence length (VSL) datasets and VSL curriculums as introduced in "Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum".
- Added
Lion
andSkipStepLion
optimizers. - Added
init_seed
argument toTransformer
andTransformerConfig
.
Changed ⚠️
- Renamed
MemMapDataset
toNumpyFSLDataset
. - Batch size is now specified in tokens, not instances.
Commits
3aa7a1b Allow configuring model init seed, make iterable dataset classes into data loaders (#49)
bf48f80 Big changes to dataset API, adds support for variable sequence length training (#48)
3f4bbd3 update configs for latest architecture (#47)
ecfd2d0 Ensure speed monitor can handle variable sequence length (#46)
12c629a Add a callback for sequence length warm-up scheduling (#45)
603952b Clean up optimizer API (#44)
5036d90 Add Lion optimizer and a "skip step" version of it (#43)
50857c4 split up optim module
cef56ea Rename MemMapDataset
to NumpyDataset
(#42)
6c0ccf3 Add support for sequence length warm-up (#41)
4b65602 minor clean up of distributed utils
v1.0.6
What's new
Added 🎉
- Added "selected_modules" transformer activation checkpointing mode.
- Added
OLMo-1B.py
official training script. - Added
OLMo-13B.py
official training script. - Added
Trainer.get_metric()
,.get_loss()
, and.get_zloss()
methods. - Added
io.copy_file()
function. - Added
ProfilerCallback
for profiling/tracing the training loop with PyTorchprofiler
module. - Added an "L2 norm" metric reduce type.
Fixed ✅
- Made reducing metrics more numerically stable with large world sizes.
Commits
c3664ae remove barrier from iterable dataset
073cf0d Add an "L2 norm" metric reduce type
8623a4a Add a PyTorch profiler callback (#40)
78caf74 Add more to docs
c2be557 Add official 13B training script (#39)
d59726b Fix typo in OLMo-1B.py
80b1439 Add links to official training scripts
868b5eb Add official 1B training script (#38)
5bf1731 Add selected_modules
transformer activation checkpointing mode (#37)
v1.0.5
What's new
Fixed ✅
- Fixed bug with checkpointer callback searching for existing ephemeral checkpoints when the checkpoint folder doesn't exist.
- Checkpointer callback won't collect existing ephemeral checkpoints that were saved after the checkpoint that was loaded from.
Commits
69dec9b don't remove newer checkpoints
9d03cf9 Fix bug with checkpointer callback
8e4c9e2 guard against duplicate tags
v1.0.4
What's new
Added 🎉
- Added
Trainer.save_checkpoint()
andTrainer.save_checkpoint_async()
methods. - Added
Callback.post_checkpoint_saved()
andCallback.post_checkpoint_loaded()
methods. - Added
ConfigSaverCallback
. - Added
MemMapDataset.fingerprint
property.
Changed ⚠️
- The
work_dir
argument toTrainerConfig
now defaults tosave_folder
issave_folder
is a local path, otherwise a temporary directory with the same name as the basename of thesave_folder
. - The
seed
argument toprepare_training_environment()
is now optional.
Fixed ✅
- Fixed setting the right env vars for single node training on Jupiter.
Commits
da3dfa6 Better dataset fingerprint checking, refactor activation checkpointing (#36)
b81664d Add CI jobs for checking docs build
1d237dc Add to trainer and callback API (#35)
6569131 clean up train/launch scripts
c43039e Update examples and launch scripts (#34)
b06a96f actually fix link
b2a8130 for URL for link
v1.0.3
What's new
Added 🎉
- Add
Trainer.hard_stop
field. - The trainer now catches
SIGTERM
and marks the run as canceled. - Added
CheckpointerCallback.remove
strategy for configuring which old checkpoints found in the save folder are removed. - Added
ReorderedNormTransformerBlock
implementation. - Added
WandBCallback.notes
field.
Fixed ✅
- Fixed bug with how command arguments were expanded by
BeakerLaunchConfig
.
Commits
6d76f89 Add overview section to docs
c009f89 Minor feature additions (#33)
535722c More trainer / launch improvements (#32)
v1.0.2
What's new
Added 🎉
- Added support for unsharding model state into
safetensors
format witholmo_core.distributed.checkpoint.unshard_checkpoint(..., use_safetensors=True)
. - Added
data.TokenizerConfig
config class anddata.TokenizerName
enumeration. - Added data mixes with
data.DataMix
API. - Added
block_idx
attribute to theTransformerBlock
class. - Added
init_method
option toTransformer
for controlling how the weights are initialized.
Fixed ✅
- Fixed
list_directory
for remote folders.
Changed ⚠️
- Callbacks now have to have a name assigned.
Commits
81f3588 Assign a unique name to each callback (#31)
35145b0 Fixes for remote checkpointing (#30)
3d53c65 Add init_method
option to Transformer
65e21ac Allow customizing init func of transformer
6a5c5cf Add data mixes (#29)
b35283f add configuration class for tokenizer
e820017 Add warning about missing keys when loading
e254d41 Add project urls for PyPI
0e9e262 Update README.md
de50e54 Update image build pipeline (#28)
95d4cd9 Add support for unsharding model with safetensors