Skip to content

Infrastructure as code to set up an MLflow tracking server in AWS.

License

Notifications You must be signed in to change notification settings

tjoliveira/mlflow-tracking-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLFlow Tracking Server

This repo contains infrastructure as code to quickly set up an MLflow Tracking server on AWS.

It uses the following AWS services:

  • S3
  • RDS (MySQL database)
  • Elastic Container Registry (ECR)
  • AWS Application Runner (APR)

MLFlow Architecture

MLflow is an open source platform for the management of the machine learning lifecycle. In this project we use it for the purpose of tracking experiments. MLflow tracking allows the tracking of experiments using Python, REST, R API, and Java API APIs. The elements tracked include model parameters, metrics, the models resulting from the experiments, and informative plots.

The components of a tracking are:

  • HTTP server: the server hosting the MLflow Tracking server;
  • Backend store: a database used to store experiment metadata, parameters, and metrics;
  • Artifact store: a volume used to store artifacts, which are the file outputs of experiments, including files, plots, etc.

The architecture followed in this project corresponds to a scenario described in the tracking server documentation, in which the MLflow Tracking server runs on a remote host (in this case the AWS Application Runner) with an artifact store set in an S3 bucket, and a backend store set in a MySQL database.

The tracking server sits behind an Nginx reverse proxy server which requires an authentication with a username and password. The credentials are set by the user at the time of deploying the infrastructure.

image

Requirements

The following tools should be installed (versions are just for reference):

Please configure the AWS CLI with your Access Keys.

Instructions

  1. Clone the repository
git clone https://github.com/mlflow-tracking-server
  1. Open the mlflow-tracking-server/infrastructure directory
cd mlflow-tracking-server/infrastructure
  1. Initialize terraform
terraform init
  1. Deploy the infrastructure to AWS.
terraform apply

You will be prompted to input the following information:

  • aws_access_key_id
  • aws_secret_key
  • mlflow_tracking_user (set by the user)
  • mlflow_tracking_password (set by the user)

The console will output the MLflow tracking URL as seen in the image below.

image

  1. Destroy the infrastructure
terraform destroy

Example of client code to test the server

import mlflow
import random
import os
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

os.environ["MLFLOW_TRACKING_USERNAME"] = <USERNAME>
os.environ["MLFLOW_TRACKING_PASSWORD"] = <PASSWORD>

mlflow.set_tracking_uri(<TRACKING_URI>)

random.seed(10)

iris = datasets.load_iris()
x = iris.data
y = iris.target

x_train, x_test, y_train, y_test = train_test_split(
    x, 
    y, 
    test_size=0.33, 
    random_state=42
)

clf = DecisionTreeClassifier(random_state=0)

mlflow.sklearn.autolog()
clf.fit(x_train, y_train)
clf.score(x_test, y_test)

Application

image

image

image

References

[1] https://registry.terraform.io/providers/hashicorp/aws/latest/docs

[3] https://docs.aws.amazon.com/

[2] https://medium.com/mlops-community/deploying-mlflow-in-aws-app-runner-cc6caf7fb8e3

About

Infrastructure as code to set up an MLflow tracking server in AWS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published