All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
v1.8.0 - 2025-01-29
- Added support for tensor parallelism. See the
TransformerConfig
class for usage. - Added more downstream tasks from the model ladder.
- Added
io.copy_dir()
function. - Added new LR schedulers:
LinearWithWarmup
,InvSqrtWithWarmup
,ConstantWithWarmup
,SequentialScheduler
. - Added option to pre-download checkpoint files from remote storage before trying to load a checkpoint.
- Added a callback for sending Slack notifications.
- Makes the MPS device work on Apple Silicon
- Added
SkipStepAdamW
optimizer. - The trainer can load model-only checkpoints now.
- Added the option to throttle checkpoint uploads to one rank from each node at a time.
- Added support for logging rich Table objects as text in source mixture datasets.
- Added
unshard_strategy
parameter tounshard_checkpoint()
function inolmo_core.distributed.checkpoint
. - Added function
load_keys()
toolmo_core.distributed.checkpoint
.
- Changed storage of shared shard state in sharded checkpoints from smallest shard to lowest rank (normally 0).
- Changed how the trainer handles loading a checkpoint when
load_path
is provided. Nowload_path
is only used if no checkpoint is found in thesave_folder
.
- Added missing
weights_only=False
argument to fix loading train checkpoints with newer versions of PyTorch. - Fixed bug where GCS upload does not retry on transient failures.
- Fixed bug where source mixture datasets were truncating source files instead of randomly sampling.
- Fixed bug in source mixture datsets where sampling from small npy files raised an mmap exception due to 0 instances in the sampled index.
v1.7.0 - 2024-11-27
- Added
key_mapping
argument toolmo_core.distributed.checkpoint.load_model_and_optim_state()
for loading checkpoints with different key names. - Added
load_key_mapping
field to the trainer, same idea as the newkey_mapping
argument above. - Added an implementation of nGPT called
NormalizedTransformer
. - Added an example showing how to convert a HuggingFace Llama 3.2 checkpoint into the right format for OLMo-core.
- Added an API for scaling RoPE embeddings.
- Added a
ModelLadder
API.
- The
w_out
andnorm
top-level children of theTransformer
model are now wrapped together in anlm_head
module. Training scripts will have backwards compatibility with older checkpoints due to theload_key_mapping
explained above.
- (Optimization) Mark model input sizes as dynamic for
torch.compile()
to avoid recompile during evals or variable-sequence / batch size training. This doesn't seem to hurt throughput. - Made HTTPS and GCS IO functions more robust.
- Fixed a bug where we were always getting dolma2 tokenized validation data when generating config with DataMix.v3_small_ppl_validation.
v1.6.3 - 2024-11-15
- Added
olmo_core.distributed.checkpoint.get_checkpoint_metadata()
function. - (BETA) Added flag to compile the optimizer step. So far only tested with AdamW. May not work with other optimizers.
- Old ephemeral checkpoints won't be removed until after the latest ephemeral checkpoint is saved successfully.
- Made GCS uploads more robust.
- Fixed single-node training on Google Augusta cluster.
numpy.random.dirichlet()
does not always sum to 1.0, so allow for a small tolerance in validating domain weights.
v1.6.2 - 2024-11-08
- Added option to disable
GarbageCollectorCallback
, not that you'd want to do this usually, but I needed to run an experiment to show how important that callback is.
- Fixed a bug where some default callbacks could be added twice if given a different name by the user.
- Fixed a bug where some
Trainer
bookkeeping tasks may not complete before.fit()
returns.
v1.6.1 - 2024-11-06
- Added
retries
field toBeakerLaunchConfig
. - Allow running on Augusta cluster with existing train scripts.
- Added
olmo_core.utils.logging_configured()
function to check if logging has been configured.
- Fixed a potential distributed deadlock bug when training without a separate CPU-only bookkeeping backend.
- Removed some unnecessary host-device syncs in
olmo_core.distributed.utils
. - Added
Trainer(Config).async_bookkeeping
field to toggle async bookkeeping.
v1.6.0 - 2024-11-01
- Added option to compile the trainer's loss function (
Trainer.compile_loss
). - Added
SourceMixtureDataset
for composing a training mixture based on ratios of source datasets. - Added
NumpyFSLDatasetMixture
for constructing aNumpyDatasetBase
from aSourceMixtureDataset
. Note this is only supported for FSL datasets. - Added tests for
SourceMixture*
andNumpyFSLDatasetMixture
. - Added
DownstreamEvaluatorCallbackConfig
class for running in-loop downstream eval via OLMo-in-loop-evals.
- Moved some types into
olmo_core.data.types
to avoid some circular dependencies.
- Made GCS client more robust by automatically retrying timeout errors for most operations.
v1.5.0 - 2024-10-23
- Added Google Cloud support for
list_directory()
andclear_directory()
. - Added
CometCallback
for logging training runs to Comet.ml. - Added
DataMixBase
class, to allow extending to new data mix groups. - Added support for MoE-based models.
- Added method
DataLoaderBase.get_mock_batch()
. - Trainer now starts with a dry-run of a fake batch created by
DataLoaderBase.get_mock_batch()
. - Added
Callback.pre_backward()
,.pre_eval_batch()
, and.post_eval_batch()
methods. - Added
Trainer.model_forward()
,.get_losses()
, and.eval_batch()
methods. - Added a new
TransformerActivationCheckpointingMode
, "selected_ops" (requires torch 2.5 or newer).
BeakerLaunchConfig.setup_steps
should now include steps to clone your repo (which it will by default). This change allows support for private repos.
prepare_cli_environment()
now callsadd_cached_path_clients()
.- Removed an unnecessary host-device sync.
v1.4.0 - 2024-10-02
- Updated default layer norm epsilon for OLMo models from
1e-5
to1e-6
to match latest model. - Renamed
FSLDataLoader
toNumpyFSLDataLoader
. - Renamed
VSLDataLoader
toNumpyVSLDataLoader
. - The trainer now takes a
data_loader: DataLoaderBase
instead of adataset: NumpyDatasetBase
.
v1.3.2 - 2024-09-27
- Added
Config.validate()
,Config.replace()
, andConfig.apply()
methods. - Trainer now records sequence length as a metric.
- Ensure additional cached-path clients are added in the process pool workers from some dataset preparation methods.
- Fixed
label_mask
tensor created byNumpyPaddedFSLDataset
. - Removed redundant warning messages about CUDA alloc retries.
- Fixed non-deterministic deadlock bug with async checkpointing.
v1.3.1 - 2024-09-26
- Fixed the name given to evaluator metrics logged.
v1.3.0 - 2024-09-26
- Added
torchao
to the Docker/Beaker images. - Added support for
torchao
float8
training via theFloat8HandlerCallback
. - Added
Callback.post_attach()
method.
v1.2.0 - 2024-09-25
- Added support for wildcards in
OptimGroupOverride.params
. - Added
NumpyPaddedFSLDataset
variant. - Added
Evaluator
class andEvaluatorCallback
for in-loop evals. - Added
v3-small-ppl-validation
data mix.
- Fixed bug with data loader when using threading.
v1.1.0 - 2024-09-18
- Added support for changing train sequence length when loading a checkpoint.
- Added support for sequence length warm-up during training via the callback
SequenceLengthSchedulerCallback
. - Added support for variable sequence length (VSL) datasets and VSL curriculums as introduced in "Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum".
- Added
Lion
andSkipStepLion
optimizers. - Added
init_seed
argument toTransformer
andTransformerConfig
.
- Renamed
MemMapDataset
toNumpyFSLDataset
. - Batch size is now specified in tokens, not instances.
v1.0.6 - 2024-09-05
- Added "selected_modules" transformer activation checkpointing mode.
- Added
OLMo-1B.py
official training script. - Added
OLMo-13B.py
official training script. - Added
Trainer.get_metric()
,.get_loss()
, and.get_zloss()
methods. - Added
io.copy_file()
function. - Added
ProfilerCallback
for profiling/tracing the training loop with PyTorchprofiler
module. - Added an "L2 norm" metric reduce type.
- Made reducing metrics more numerically stable with large world sizes.
v1.0.5 - 2024-09-03
- Fixed bug with checkpointer callback searching for existing ephemeral checkpoints when the checkpoint folder doesn't exist.
- Checkpointer callback won't collect existing ephemeral checkpoints that were saved after the checkpoint that was loaded from.
v1.0.4 - 2024-09-01
- Added
Trainer.save_checkpoint()
andTrainer.save_checkpoint_async()
methods. - Added
Callback.post_checkpoint_saved()
andCallback.post_checkpoint_loaded()
methods. - Added
ConfigSaverCallback
. - Added
MemMapDataset.fingerprint
property.
- The
work_dir
argument toTrainerConfig
now defaults tosave_folder
issave_folder
is a local path, otherwise a temporary directory with the same name as the basename of thesave_folder
. - The
seed
argument toprepare_training_environment()
is now optional.
- Fixed setting the right env vars for single node training on Jupiter.
v1.0.3 - 2024-08-30
- Add
Trainer.hard_stop
field. - The trainer now catches
SIGTERM
and marks the run as canceled. - Added
CheckpointerCallback.remove
strategy for configuring which old checkpoints found in the save folder are removed. - Added
ReorderedNormTransformerBlock
implementation. - Added
WandBCallback.notes
field.
- Fixed bug with how command arguments were expanded by
BeakerLaunchConfig
.
v1.0.2 - 2024-08-29
- Added support for unsharding model state into
safetensors
format witholmo_core.distributed.checkpoint.unshard_checkpoint(..., use_safetensors=True)
. - Added
data.TokenizerConfig
config class anddata.TokenizerName
enumeration. - Added data mixes with
data.DataMix
API. - Added
block_idx
attribute to theTransformerBlock
class. - Added
init_method
option toTransformer
for controlling how the weights are initialized.
- Fixed
list_directory
for remote folders.
- Callbacks now have to have a name assigned.
v1.0.1 - 2024-08-26
- Fixed a bug with resetting the initial LR in optimizers after a loading a checkpoint.
v1.0.0 - 2024-08-26
- Ported, refactored, and optimized the modeling and training from the OLMo repo while fixing several bugs. Introduces a new highly efficient yet customizable trainer and a standard API for launching jobs directly to Beaker from a Python script.
v0.1.0 - 2024-06-11
- Initial release.