Skip to content

This repository provides a Docker Compose configuration for training, evaluating, and inferring on the MNIST Hogwild dataset with PyTorch. It simplifies the workflow by integrating three services: train, evaluate, and infer, enabling an efficient and reproducible machine learning environment.

Notifications You must be signed in to change notification settings

AnkitaMungalpara/Dockerized-training-evaluation-inference-with-PyTorch

Repository files navigation

MNIST Digit Classification Using Docker Compose: Training, Evaluation, and Inference

This project provides a Docker Compose configuration to handle training, evaluation, and inference on the MNIST Hogwild dataset with PyTorch. It uses Docker Compose to orchestrate three services: train, evaluate, and infer.

Table of Contents

Requirements 📦

  • torch
  • torchvision

You can install the requirements using the following command:

pip install -r requirements.txt

Introduction to Docker and Docker Compose 🐳

Docker is an open-source platform that automates the deployment of applications in lightweight, portable containers. Containers allow developers to package an application along with its dependencies, ensuring consistency across environments.

Docker Compose is a tool specifically designed to define and manage multi-container Docker applications. It allows you to describe how different services (e.g., training, evaluation, and inference) in an application interact with each other, making it easier to maintain, scale, and manage. Docker Compose helps in building machine learning solutions in the following ways:

Simplify Deployment:

  • Quickly set up training, evaluation, and inference environments in an isolated, reproducible way.

Maintain Consistency:

  • Avoid compatibility issues by packaging dependencies with the code.

Streamline Workflow:

  • Execute tasks (like training, evaluation, and inference) effortlessly across services.

Docker Compose Services 🛠️

The Docker Compose configuration file docker-compose.yaml defines three services:

🔷 train

  • Trains the MNIST model.
  • Checks for a checkpoint file in the shared volume. If found, resumes training from that checkpoint.
  • Saves the final checkpoint as mnist_cnn.pt and exits.

🔷 evaluate

  • Checks for the final checkpoint (mnist_cnn.pt) in the shared volume.
  • Evaluates the model and saves metrics in eval_results.json.
  • The model code is imported rather than copy-pasted into eval.py.

🔷 infer

  • Runs inference on sample MNIST images.
  • Saves the results (images with predicted numbers) in the results folder within the Docker container and exits.

Command-Line Arguments 🔧

The MNIST training script accepts the following command-line arguments:

Argument Description Default
--batch-size Input batch size for training 64
--epochs Number of epochs to train 10
--lr Learning rate 0.01
--momentum SGD momentum 0.5
--seed Random seed 1
--log-interval How many batches to wait before logging training status 10
--num-processes Number of processes to run script on for distributed processing 2
--dry-run Quickly check a single pass without full training False
--save_model Flag to save the trained model True
--save-dir Directory where the checkpoint will be saved ./

Docker Compose Configuration 📝

docker-compose.yml

version: '3.8'

services:
  train:
    build:
      context: .
      dockerfile: Dockerfile.train
    volumes:
      - mnist:/opt/mount
      - ./model:/opt/mount/model
      - ./data:/opt/mount/data

  evaluate:
    build:
      context: .
      dockerfile: Dockerfile.eval
    volumes:
      - mnist:/opt/mount
      - ./model:/opt/mount/model
      - ./data:/opt/mount/data

  infer:
    build:
      context: .
      dockerfile: Dockerfile.infer
    volumes:
      - mnist:/opt/mount
      - ./data:/opt/mount/data

volumes:
  mnist:

Instructions 🚀

1️⃣ Build Docker Images:

docker compose build
  • This command builds the Docker images for each service (train, evaluate, infer). It ensures that the necessary dependencies are installed, and the code is properly packaged.

2️⃣ Run Services:

  • Train:

     docker compose run train

    Command that starts the training process. It will look for existing checkpoints in the volume and resume training if any are found.

  • Evaluate:

     docker compose run evaluate

    The above command evaluates the trained model using the saved checkpoint and generates metrics like accuracy and test loss.

  • Inference:

    docker compose run infer

    The inference service runs predictions on a few random MNIST images and saves the output images with predicted labels.

3️⃣ Verify Results:

✍️ Checkpoint File:

  • Check if mnist_cnn.pt is in the mnist volume.
    • If found: "Checkpoint file found."
    • If not found: "Checkpoint file not found!" and exit with an error.

✍️ Evaluation Results:

  • Verify eval_results.json in the mnist volume.
    • Example format: {"Test loss": 0.0890245330810547, "Accuracy": 97.12}

✍️ Inference Results:

  • Check the results folder in the mnist volume for saved images with predicted numbers.

Results 📊

Here are some sample predicted images generated by the infer service:

7

Actual Label: 7 | Predicted: 7 | (Confidence: 1.00)

9

Actual Label: 9 | Predicted: 9 | (Confidence: 1.00)

4

Actual Label: 4 | Predicted: 4 | (Confidence: 1.00)

2

Actual Label: 1 | Predicted: 1 | (Confidence: 1.00)

References 🔗

About

This repository provides a Docker Compose configuration for training, evaluating, and inferring on the MNIST Hogwild dataset with PyTorch. It simplifies the workflow by integrating three services: train, evaluate, and infer, enabling an efficient and reproducible machine learning environment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages