Important
Our multi-node cluster training product is in early preview and not generally accessible. Please contact us for access.
Well documented examples of running distributed training jobs on Modal. Use this repository to learn how to build distributed training jobs on Modal.
resnet50/
training a ResNet50 model on the ImageNet dataset.nanoGPT/
training Karpathy's nanoGPT reproduction of OpenAI's GPT-2.
The multi-node training guide is currently available on Notion: modal-com.notion.site/Multi-node-docs.
Other relevant documentation in our guide:
multinode-resnet50.online-video-cutter.com.mp4
The MIT license.