Skip to content

NVIDIA BioNeMo Framework 2.0

Compare
Choose a tag to compare
@tshimko-nv tshimko-nv released this 23 Oct 21:54
291d0ac

New Features:

  • ESM2 implementation
    • State of the art training performance and equivalent accuracy to the reference implementation
    • 650M, and 3B scale checkpoints available which mirror the reference model
    • Flexible fine-tuning examples that can be copied and modified to accomplish a wide variety of downstream tasks
  • First version of our NeMo v2 based reference implementation which re-imagines bionemo as a repository of megatron models, dataloaders, and training recipes which make use of NeMo v2 for training loops.
    • Modular design and permissible Apache 2 OSS licenses enables the import and use of our framework in proprietary applications.
    • NeMo2 training abstractions allows the user to focus on the model implementation while the training strategy handles distribution and model parallelism.
  • Documentation and documentation build system for BioNeMo 2.

Known Issues:

  • PEFT support is not yet fully functional.
  • Partial implementation of Geneformer is present, use at your own risk. It will be optimized and officially released in the future.
  • Command line interface is currently based on one-off training recipes and scripts. We are working on a configuration based approach that will be released in the future.
  • Fine-tuning workflow is implemented for BERT based architectures and could be adapted for others, but it requires you to inherit from the biobert base model config. You can follow similar patterns in the short term to load weights from an old checkpoint partially into a new model, however in the future we will have a more direct API which is easier to follow.
  • Slow memory leak occurs during ESM-2 pretraining, which can cause OOM during long pretraining runs. Training with a
    microbatch size of 48 on 40 A100s raised an out-of-memory error after 5,800 training steps.
    • Possible workarounds include calling gc.collect(); torch.cuda.empty_cache() at every ~1,000 steps, which appears
      to reclaim the consumed memory; or training with a lower microbatch size and re-starting training from a saved
      checkpoint periodically.

External Partner Contributions

We would like to thank the following organizations for their insightful discussions guiding the development of the BioNeMo Framework and their valuable contributions to the codebase. We are grateful for your collaboration.

Changes

Full Changelog: https://github.com/NVIDIA/bionemo-framework/commits/v2.0

Documentation and Field Support

Additional support and significant documentation overhauls performed by: