Skip to content

Well documented examples of running distributed training jobs on Modal

License

Notifications You must be signed in to change notification settings

modal-labs/multinode-training-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Important

Our multi-node cluster training product is in early preview and not generally accessible. Please contact us for access.


Modal Multinode Training Guide

Well documented examples of running distributed training jobs on Modal. Use this repository to learn how to build distributed training jobs on Modal.

Examples

  • resnet50/ training a ResNet50 model on the ImageNet dataset.
  • nanoGPT/ training Karpathy's nanoGPT reproduction of OpenAI's GPT-2.

Documentation

The multi-node training guide is currently available on Notion: modal-com.notion.site/Multi-node-docs.

Other relevant documentation in our guide:

Demo

multinode-resnet50.online-video-cutter.com.mp4

License

The MIT license.

About

Well documented examples of running distributed training jobs on Modal

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published