Basic MLOps pipeline for Santander Customer Transaction Prediction
This repo aims to demonstarate MLOps skills while solving a classification problem from Kaggle. To know more about Problem Statement, refer here.
Before diving into code, please go to deployment section to deploy app using docker compose to get a better understanding of it.
You can deploy it on your own easily and (possibly) free of charge on cloud. Scroll down to Docker Playground Cloud Deployment
in Deployment
section.
notebooks
- Contains EDA (exploratory data analysis) and model development steps including all preprocessing and evaluation. Once preprocessing steps are defined and model is selected, final code is processed tomodel_training/train.py
for CI/CD.src
- Contains frontend and backend services along with champion model training script.backend
- RestAPI endpoints developed using FastAPI.frontend
- Basic frontend app developed using Streamlit.model_training
- Depending on model/data size and model training time, this could be excuted on locally hosted runner.train_boilerplate
- Boiler plate code for all steps to approach a classification problem.train
- Final selected model and preprocessing code that are to be executed for train/predict pipeline.
docker-compose
- Compose file which starts backend and frontend services to run application.
There are two github workflows each for managing frontend and backend services, described below -
frontend_container
- Monitors file changes insrc/frontend/
, any change in.py
files here will trigger this and it'll rebuild frontend container image for the application and push to dockerhub. More details here.train_model
- Monitors file changes insrc/backend/
andsrc/model_training/train.py
, any change here will trigger this workflow and it'll train the model described intrain.py
and pack it up in docker container with backend service for the application and push to dockerhub. More details here.- Updating containers on remote server is done via
polling
implemented here. This script is set up asCRON
job with time interval of300s
. It compares local and remote containers hash and deploys updated container in case of mismatch.
-
Local deployment
- Install Docker. Instructions available here. Make sure docker is up and running before proceeding.
- Install Git. Instruction here.
- Clone repo and run compose
git clone https://github.com/uditmanav17/assessments.git && cd ./assessments docker compose --profile app up
--profile app
will start both frontend and backend services onlocalhost:8080
andlocalhost:8000
ports.
-
Docker Playground Cloud Deployment
- Navigate to docker playground.
- Login using your docker account. Click Start. This will direct you to a new page.
- Click
Add New Instance
on left pane. Then run following commands in terminal -
git clone https://github.com/uditmanav17/assessments.git && cd ./assessments docker compose --profile app up
- This will open up port
8000
for backend endpoints and8080
for frontend. - To access application, click on port numbers next to
OPEN PORT
button to visit frontend/backend service.
- Use event driven approach instead of polling for deployment.
- Data validation checks on uploaded files.
- Data versioning - DVC
- Experiment and artifacts tracking - MLFlow, WandB
- Better methods to save and load models like joblib, don't pickle.
- Serverless on-demand architecture.
- Run backend server with
gunicorn
instead ofuvicorn
. - More tips here.
- If using deep learning model, try quantization and converting model to ONNX format for better inference speed and less memory usage.