Skip to content

Latest commit

 

History

History
72 lines (55 loc) · 1.59 KB

README.md

File metadata and controls

72 lines (55 loc) · 1.59 KB

Slurm Learning Environment

This directory contains Docker-based Slurm cluster configurations for learning and testing purposes. It includes both basic Slurm setup and machine learning job configurations.

Available Configurations

A minimal Slurm cluster setup to learn the basics of:

  • Job submission and management
  • Resource allocation
  • Job arrays
  • Basic monitoring and administration
cd slurm-basic
docker-compose up -d --build

Extended Slurm configuration with Python ML libraries and example jobs showing:

  • Single-node ML training
  • Hyperparameter search using job arrays
  • Distributed training across multiple nodes
  • ML job monitoring and management
cd slurm-ml
docker-compose up -d --build

Quick Start

  1. Choose your configuration:

    • For learning Slurm basics: Use the basic setup
    • For ML workloads: Use the ML-ready setup
  2. Navigate to the chosen directory:

# For basic setup
cd slurm-basic

# OR for ML setup
cd slurm-ml
  1. Start the cluster:
docker-compose up -d --build
  1. Connect to the controller node:
docker exec -it slurmctld bash
  1. Verify the cluster:
sinfo

Prerequisites

  • Docker Engine installed
  • Docker Compose installed
  • At least 4GB RAM available
  • About 10GB free disk space

Need Help?