Skip to content

uditmanav17/assessments

Repository files navigation

Objective

This repo aims to demonstarate MLOps skills while solving a classification problem from Kaggle. To know more about Problem Statement, refer here.

Public Endpoints for Deployed App

Before diving into code, please go to deployment section to deploy app using docker compose to get a better understanding of it.

You can deploy it on your own easily and (possibly) free of charge on cloud. Scroll down to Docker Playground Cloud Deployment in Deployment section.

Architecture Diagram

System Diagram

Code Structure / Services

  • notebooks - Contains EDA (exploratory data analysis) and model development steps including all preprocessing and evaluation. Once preprocessing steps are defined and model is selected, final code is processed to model_training/train.py for CI/CD.
  • src - Contains frontend and backend services along with champion model training script.
    • backend - RestAPI endpoints developed using FastAPI.
    • frontend - Basic frontend app developed using Streamlit.
    • model_training - Depending on model/data size and model training time, this could be excuted on locally hosted runner.
      • train_boilerplate - Boiler plate code for all steps to approach a classification problem.
      • train - Final selected model and preprocessing code that are to be executed for train/predict pipeline.
  • docker-compose - Compose file which starts backend and frontend services to run application.

Operations side of MLOps / Workflows Description

There are two github workflows each for managing frontend and backend services, described below -

  • frontend_container - Monitors file changes in src/frontend/, any change in .py files here will trigger this and it'll rebuild frontend container image for the application and push to dockerhub. More details here.
  • train_model - Monitors file changes in src/backend/ and src/model_training/train.py, any change here will trigger this workflow and it'll train the model described in train.py and pack it up in docker container with backend service for the application and push to dockerhub. More details here.
  • Updating containers on remote server is done via polling implemented here. This script is set up as CRON job with time interval of 300s. It compares local and remote containers hash and deploys updated container in case of mismatch.

Deployment

  • Local deployment

    • Install Docker. Instructions available here. Make sure docker is up and running before proceeding.
    • Install Git. Instruction here.
    • Clone repo and run compose
    git clone https://github.com/uditmanav17/assessments.git && cd ./assessments
    docker compose --profile app up
    
    • --profile app will start both frontend and backend services on localhost:8080 and localhost:8000 ports.
  • Docker Playground Cloud Deployment

    • Navigate to docker playground.
    • Login using your docker account. Click Start. This will direct you to a new page.
    • Click Add New Instance on left pane. Then run following commands in terminal -
    git clone https://github.com/uditmanav17/assessments.git && cd ./assessments
    docker compose --profile app up
    
    • This will open up port 8000 for backend endpoints and 8080 for frontend.
    • To access application, click on port numbers next to OPEN PORT button to visit frontend/backend service.

Future Work / Improvements

  • Use event driven approach instead of polling for deployment.
  • Data validation checks on uploaded files.
  • Data versioning - DVC
  • Experiment and artifacts tracking - MLFlow, WandB
  • Better methods to save and load models like joblib, don't pickle.
  • Serverless on-demand architecture.
  • Run backend server with gunicorn instead of uvicorn.
  • More tips here.
  • If using deep learning model, try quantization and converting model to ONNX format for better inference speed and less memory usage.

About

Basic E2E ML Ops pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •