This repo contains infrastructure as code to quickly set up an MLflow Tracking server on AWS.
It uses the following AWS services:
- S3
- RDS (MySQL database)
- Elastic Container Registry (ECR)
- AWS Application Runner (APR)
MLflow is an open source platform for the management of the machine learning lifecycle. In this project we use it for the purpose of tracking experiments. MLflow tracking allows the tracking of experiments using Python, REST, R API, and Java API APIs. The elements tracked include model parameters, metrics, the models resulting from the experiments, and informative plots.
The components of a tracking are:
- HTTP server: the server hosting the MLflow Tracking server;
- Backend store: a database used to store experiment metadata, parameters, and metrics;
- Artifact store: a volume used to store artifacts, which are the file outputs of experiments, including files, plots, etc.
The architecture followed in this project corresponds to a scenario described in the tracking server documentation, in which the MLflow Tracking server runs on a remote host (in this case the AWS Application Runner) with an artifact store set in an S3 bucket, and a backend store set in a MySQL database.
The tracking server sits behind an Nginx reverse proxy server which requires an authentication with a username and password. The credentials are set by the user at the time of deploying the infrastructure.
The following tools should be installed (versions are just for reference):
Please configure the AWS CLI with your Access Keys.
- Clone the repository
git clone https://github.com/mlflow-tracking-server
- Open the mlflow-tracking-server/infrastructure directory
cd mlflow-tracking-server/infrastructure
- Initialize terraform
terraform init
- Deploy the infrastructure to AWS.
terraform apply
You will be prompted to input the following information:
- aws_access_key_id
- aws_secret_key
- mlflow_tracking_user (set by the user)
- mlflow_tracking_password (set by the user)
The console will output the MLflow tracking URL as seen in the image below.
- Destroy the infrastructure
terraform destroy
import mlflow
import random
import os
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
os.environ["MLFLOW_TRACKING_USERNAME"] = <USERNAME>
os.environ["MLFLOW_TRACKING_PASSWORD"] = <PASSWORD>
mlflow.set_tracking_uri(<TRACKING_URI>)
random.seed(10)
iris = datasets.load_iris()
x = iris.data
y = iris.target
x_train, x_test, y_train, y_test = train_test_split(
x,
y,
test_size=0.33,
random_state=42
)
clf = DecisionTreeClassifier(random_state=0)
mlflow.sklearn.autolog()
clf.fit(x_train, y_train)
clf.score(x_test, y_test)
[1] https://registry.terraform.io/providers/hashicorp/aws/latest/docs
[3] https://docs.aws.amazon.com/
[2] https://medium.com/mlops-community/deploying-mlflow-in-aws-app-runner-cc6caf7fb8e3