Skip to content

Commit 838a07a

Browse files
tyapochkinskrawcz
authored andcommitted
Add an example of using Hamilton in AWS SageMaker
1 parent 053f8e5 commit 838a07a

File tree

9 files changed

+248
-0
lines changed

9 files changed

+248
-0
lines changed

examples/aws/sagemaker/README.md

+74
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Deploying Hamilton Functions as an AWS SageMaker Processing Job
2+
3+
[AWS SageMaker](https://aws.amazon.com/sagemaker/) is a comprehensive platform that facilitates the creation, training, and deployment of machine learning (ML) models. This guide demonstrates deploying a "hello-world" [processing job](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) using Hamilton functions on SageMaker.
4+
5+
## Prerequisites
6+
7+
- **AWS CLI Setup**: Ensure that the AWS CLI is configured on your machine. Follow the [Quick Start guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html) for setup instructions.
8+
9+
## Step-by-Step Guide
10+
11+
### 1. Build the Docker Image
12+
13+
Navigate to the container directory and build the Docker image:
14+
15+
```shell
16+
cd container/ && docker build --platform linux/amd64 -t aws-sagemaker-hamilton . && cd ..
17+
```
18+
19+
### 2. Create AWS ECR repository.
20+
21+
Replace `111122223333` with your AWS account number.
22+
23+
- **Authenticate Docker to Amazon ECR**:
24+
25+
Retrieve an authentication token to authenticate your Docker client to your Amazon Elastic Container Registry (ECR):
26+
27+
```shell
28+
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 111122223333.dkr.ecr.us-east-1.amazonaws.com
29+
```
30+
31+
- **Create the ECR Repository**:
32+
33+
```shell
34+
aws ecr create-repository --repository-name aws-sagemaker-hamilton --region us-east-1 --image-scanning-configuration scanOnPush=true --image-tag-mutability MUTABLE
35+
```
36+
37+
### 3. Deploy the Image to AWS ECR
38+
39+
Ensure the AWS account number is correctly replaced with yours:
40+
41+
```shell
42+
docker tag aws-sagemaker-hamilton 111122223333.dkr.ecr.us-east-1.amazonaws.com/aws-sagemaker-hamilton:latest
43+
docker push 111122223333.dkr.ecr.us-east-1.amazonaws.com/aws-sagemaker-hamilton:latest
44+
```
45+
46+
### 4. Create simple role for AWS SageMaker ScriptProcessor.
47+
48+
- **Create the Role**:
49+
50+
Example of creating an AWS Role with full permissions for ECR and S3.
51+
52+
```shell
53+
aws iam create-role --role-name SageMakerScriptProcessorRole --assume-role-policy-document '{"Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com"}, "Action": "sts:AssumeRole"}]}'
54+
```
55+
56+
- **Attach Policies to the Role**:
57+
58+
Here we grant full access to ECR, S3 and SageMaker as an example. For production environments it's important to restrict access appropriately.
59+
60+
```shell
61+
aws iam attach-role-policy --role-name SageMakerScriptProcessorRole --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
62+
aws iam attach-role-policy --role-name SageMakerScriptProcessorRole --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess
63+
aws iam attach-role-policy --role-name SageMakerScriptProcessorRole --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
64+
```
65+
66+
### 5. Install additional requirements
67+
68+
```shell
69+
pip install -r requirements.txt
70+
```
71+
72+
### 6. Execute the Processing Job
73+
74+
Find the detailed example in [notebook.ipynb](notebook.ipynb) to run the processing job.

examples/aws/sagemaker/app/__init__.py

Whitespace-only changes.
+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
import pandas as pd
2+
3+
from hamilton.function_modifiers import extract_columns
4+
5+
6+
@extract_columns("spend", "signups")
7+
def raw_table(input_table: pd.DataFrame) -> pd.DataFrame:
8+
return input_table
9+
10+
11+
def avg_3wk_spend(spend: pd.Series) -> pd.Series:
12+
"""Rolling 3 week average spend."""
13+
return spend.rolling(3).mean()
14+
15+
16+
def spend_per_signup(spend: pd.Series, signups: pd.Series) -> pd.Series:
17+
"""The cost per signup in relation to spend."""
18+
return spend / signups
19+
20+
21+
def spend_mean(spend: pd.Series) -> float:
22+
"""Shows function creating a scalar. In this case it computes the mean of the entire column."""
23+
return spend.mean()
24+
25+
26+
def spend_zero_mean(spend: pd.Series, spend_mean: float) -> pd.Series:
27+
"""Shows function that takes a scalar. In this case to zero mean spend."""
28+
return spend - spend_mean
29+
30+
31+
def spend_std_dev(spend: pd.Series) -> float:
32+
"""Function that computes the standard deviation of the spend column."""
33+
return spend.std()
34+
35+
36+
def spend_zero_mean_unit_variance(spend_zero_mean: pd.Series, spend_std_dev: float) -> pd.Series:
37+
"""Function showing one way to make spend have zero mean and unit variance."""
38+
return spend_zero_mean / spend_std_dev
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM python:3.11-slim-buster
2+
3+
RUN apt-get update && apt-get install -y graphviz
4+
5+
COPY requirements.txt ./
6+
7+
RUN pip install -r requirements.txt
8+
9+
ENV HAMILTON_TELEMETRY_ENABLED=false
10+
11+
ENTRYPOINT ["python3"]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
pandas
2+
sf-hamilton[visualization]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
signups,spend
2+
1,10
3+
10,10
4+
50,20
5+
100,40
6+
200,40
7+
400,50

examples/aws/sagemaker/notebook.ipynb

+86
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"metadata": {},
7+
"outputs": [],
8+
"source": [
9+
"from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput"
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": null,
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"script_processor = ScriptProcessor(\n",
19+
" command=['python3'],\n",
20+
" image_uri='<account_number>.dkr.ecr.<region>.amazonaws.com/aws-sagemaker-hamilton:latest', # Change to the actual URI\n",
21+
" role='arn:aws:iam::<account_number>:role/SageMakerScriptProcessorRole', # Change to the actual URI\n",
22+
" instance_count=1,\n",
23+
" instance_type='ml.t3.medium'\n",
24+
")"
25+
]
26+
},
27+
{
28+
"cell_type": "code",
29+
"execution_count": null,
30+
"metadata": {},
31+
"outputs": [],
32+
"source": [
33+
"# All inputs and outputs should be located in /opt/ml/processing/\n",
34+
"script_processor.run(\n",
35+
" code='processing.py',\n",
36+
" inputs=[\n",
37+
" ProcessingInput(\n",
38+
" source='data/',\n",
39+
" destination='/opt/ml/processing/input/data'\n",
40+
" ),\n",
41+
" ProcessingInput(\n",
42+
" source='app/',\n",
43+
" destination='/opt/ml/processing/input/code/app'\n",
44+
" )\n",
45+
" ],\n",
46+
" outputs=[\n",
47+
" ProcessingOutput(\n",
48+
" source='/opt/ml/processing/output/',\n",
49+
" destination='s3://path/to/output/directory' # Change to the actual URI\n",
50+
" )\n",
51+
" ]\n",
52+
")"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"metadata": {},
58+
"source": [
59+
"After job finishes, the new files will appear in `s3://path/to/output/directory`:\n",
60+
"- `output_table.csv`\n",
61+
"- `dag_visualization.svg`"
62+
]
63+
}
64+
],
65+
"metadata": {
66+
"kernelspec": {
67+
"display_name": "hamilton",
68+
"language": "python",
69+
"name": "python3"
70+
},
71+
"language_info": {
72+
"codemirror_mode": {
73+
"name": "ipython",
74+
"version": 3
75+
},
76+
"file_extension": ".py",
77+
"mimetype": "text/x-python",
78+
"name": "python",
79+
"nbconvert_exporter": "python",
80+
"pygments_lexer": "ipython3",
81+
"version": "3.11.8"
82+
}
83+
},
84+
"nbformat": 4,
85+
"nbformat_minor": 2
86+
}

examples/aws/sagemaker/processing.py

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
import pandas as pd
2+
from app import functions
3+
4+
from hamilton import driver
5+
6+
if __name__ == "__main__":
7+
8+
df = pd.read_csv("/opt/ml/processing/input/data/input_table.csv")
9+
10+
dr = driver.Driver({}, functions)
11+
12+
inputs = {"input_table": df}
13+
14+
output_columns = [
15+
"spend",
16+
"signups",
17+
"avg_3wk_spend",
18+
"spend_per_signup",
19+
"spend_zero_mean_unit_variance",
20+
]
21+
22+
# DAG visualization
23+
dot = dr.visualize_execution(final_vars=output_columns, inputs=inputs)
24+
with open("/opt/ml/processing/output/dag_visualization.svg", "wb") as svg_out:
25+
svg_out.write(dot.pipe(format="svg"))
26+
27+
# DAG execution
28+
df_result = dr.execute(output_columns, inputs=inputs)
29+
df_result.to_csv("/opt/ml/processing/output/output_table.csv")
+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
sagemaker

0 commit comments

Comments
 (0)