This directory contains Docker-based Slurm cluster configurations for learning and testing purposes. It includes both basic Slurm setup and machine learning job configurations.
A minimal Slurm cluster setup to learn the basics of:
- Job submission and management
- Resource allocation
- Job arrays
- Basic monitoring and administration
cd slurm-basic
docker-compose up -d --build
Extended Slurm configuration with Python ML libraries and example jobs showing:
- Single-node ML training
- Hyperparameter search using job arrays
- Distributed training across multiple nodes
- ML job monitoring and management
cd slurm-ml
docker-compose up -d --build
-
Choose your configuration:
- For learning Slurm basics: Use the basic setup
- For ML workloads: Use the ML-ready setup
-
Navigate to the chosen directory:
# For basic setup
cd slurm-basic
# OR for ML setup
cd slurm-ml
- Start the cluster:
docker-compose up -d --build
- Connect to the controller node:
docker exec -it slurmctld bash
- Verify the cluster:
sinfo
- Docker Engine installed
- Docker Compose installed
- At least 4GB RAM available
- About 10GB free disk space
- Basic Slurm usage: Check the basic setup guide
- ML jobs: See the ML setup guide
- Slurm documentation: Official Slurm docs