diff --git a/notebooks/images/add_the_repository.jpg b/notebooks/images/add_the_repository.jpg new file mode 100644 index 0000000..98dcb0e Binary files /dev/null and b/notebooks/images/add_the_repository.jpg differ diff --git a/notebooks/images/governance.jpg b/notebooks/images/governance.jpg new file mode 100644 index 0000000..7863cfe Binary files /dev/null and b/notebooks/images/governance.jpg differ diff --git a/notebooks/images/monitoring_job.jpg b/notebooks/images/monitoring_job.jpg new file mode 100644 index 0000000..06818b3 Binary files /dev/null and b/notebooks/images/monitoring_job.jpg differ diff --git a/notebooks/images/monitoring_result.jpg b/notebooks/images/monitoring_result.jpg new file mode 100644 index 0000000..bf1f337 Binary files /dev/null and b/notebooks/images/monitoring_result.jpg differ diff --git a/notebooks/images/repository.jpg b/notebooks/images/repository.jpg new file mode 100644 index 0000000..0c552ec Binary files /dev/null and b/notebooks/images/repository.jpg differ diff --git a/notebooks/images/runtime.jpg b/notebooks/images/runtime.jpg new file mode 100644 index 0000000..f50d219 Binary files /dev/null and b/notebooks/images/runtime.jpg differ diff --git a/notebooks/images/studio_model.jpg b/notebooks/images/studio_model.jpg new file mode 100644 index 0000000..cbd0d82 Binary files /dev/null and b/notebooks/images/studio_model.jpg differ diff --git a/notebooks/images/violations.jpg b/notebooks/images/violations.jpg new file mode 100644 index 0000000..01a172e Binary files /dev/null and b/notebooks/images/violations.jpg differ diff --git a/notebooks/model_bias_monitor.ipynb b/notebooks/model_bias_monitor.ipynb new file mode 100644 index 0000000..c307aa6 --- /dev/null +++ b/notebooks/model_bias_monitor.ipynb @@ -0,0 +1,2319 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d6eccd85-1c0e-4f7b-9da0-dd473f8a9d95", + "metadata": {}, + "source": [ + "# Model Bias Monitoring with AWS SageMaker Clarity" + ] + }, + { + "cell_type": "markdown", + "id": "67557cb8-fe01-473a-a256-442ab3131197", + "metadata": {}, + "source": [ + "This Jupyter notebook shows how to perform model bias observability with AWS SageMaker (based on [docs](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/fairness_and_explainability/SageMaker-Model-Monitor-Fairness-and-Explainability.html))\n", + "\n", + "1. Configuration\n", + "1. Dataset description\n", + "1. Deploying model for ML Observability\n", + "1. Setting up monitoring job - Creates Monitoring tasks by creating baseline and scheduling regular monitoring\n", + "1. Generate traffic - Provides traffic (examples and ground truth) to the endpoint based on which bias metrics will be calculated\n", + "1. Cleaning up - Removes all the created resources.\n", + "\n", + "Prerequisites:\n", + "\n", + "- Existing Roles with all needed permissions (S3, SageMaker, etc.);\n", + "- Configured SageMaker Domain;\n", + "- SageMaker Studio user.\n", + "\n", + "One can use SageMaker Studio (Jupyter-like environment) to run this notebook. To do that, follow the next steps:\n", + "\n", + "1. Run the SageMaker Studio\n", + "2. Clone this repository (https://github.com/griddynamics/gd-ml-observability.git)\n", + "\n", + "\n", + "\"Add\n", + "\n", + "\"repository.png\"\n", + "\n", + "3. Run this notebook cell by cell paying attention to comments" + ] + }, + { + "cell_type": "markdown", + "id": "9096958f-3dbf-41b6-96bd-dce1fe2bb299", + "metadata": {}, + "source": [ + "### Background" + ] + }, + { + "cell_type": "markdown", + "id": "30192a0e-66ad-4f5f-b1fa-10296809ef81", + "metadata": {}, + "source": [ + "A computer system powered by machine learning model might contain bias, i.e., discriminate against certain individuals or groups of individuals. The models learn from data, so it might memorize bias that appears in that data. Biased judgements impact on people badly and unfairly, and automation of such process could lead to disasterous consequences. Hence, people developing such systems must be aware of it, and be able to address biased models. [\\*](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-detect-data-bias.html)\n", + "\n", + "AWS provides SageMaker Clarity services to identify and measure bias level by calculating [pre-training](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-data-bias.html) and [post-training](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-post-training-bias.html) bias metrics based on model input, output, and ground truth information gathered and added during monitoring process and comparing it to calculated in advance baseline metrics." + ] + }, + { + "cell_type": "markdown", + "id": "4d6d91b3-4a1a-4384-b836-cdf69c6382f4", + "metadata": {}, + "source": [ + "## 1. Configuration" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "db0aa694-f45f-4051-b253-1870d150a9ab", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: boto3 in /opt/conda/lib/python3.7/site-packages (1.26.80)\n", + "Requirement already satisfied: s3transfer<0.7.0,>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from boto3) (0.6.0)\n", + "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.7/site-packages (from boto3) (1.0.1)\n", + "Requirement already satisfied: botocore<1.30.0,>=1.29.80 in /opt/conda/lib/python3.7/site-packages (from boto3) (1.29.80)\n", + "Requirement already satisfied: urllib3<1.27,>=1.25.4 in /opt/conda/lib/python3.7/site-packages (from botocore<1.30.0,>=1.29.80->boto3) (1.26.14)\n", + "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /opt/conda/lib/python3.7/site-packages (from botocore<1.30.0,>=1.29.80->boto3) (2.8.2)\n", + "Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.30.0,>=1.29.80->boto3) (1.14.0)\n", + "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", + "\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.0.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" + ] + } + ], + "source": [ + "!pip install --upgrade boto3" + ] + }, + { + "cell_type": "markdown", + "id": "3dccd774-9ef4-477a-a409-4e8a4606ee94", + "metadata": {}, + "source": [ + "### Importing necessary libraries" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "23465867-bed7-4cef-8699-3eeab0903008", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import copy\n", + "import json\n", + "import random\n", + "import time\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pandas as pd\n", + "import sagemaker\n", + "\n", + "from datetime import datetime\n", + "\n", + "from IPython.display import display\n", + "from sagemaker import get_execution_role, image_uris\n", + "from sagemaker.clarify import (\n", + " BiasConfig,\n", + " DataConfig,\n", + " ModelConfig,\n", + " ModelPredictedLabelConfig,\n", + ")\n", + "from sagemaker.model import Model\n", + "from sagemaker.model_monitor import (\n", + " BiasAnalysisConfig,\n", + " CronExpressionGenerator,\n", + " DataCaptureConfig,\n", + " EndpointInput,\n", + " ModelBiasMonitor,\n", + ")\n", + "from sagemaker.s3 import S3Downloader, S3Uploader\n" + ] + }, + { + "cell_type": "markdown", + "id": "735c2543-a393-467c-a0f8-c7c6d6c3a9a1", + "metadata": {}, + "source": [ + "### Setting up necessary variables and constants" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "be24abcc-7a7f-429b-b635-a33c30a39625", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "sagemaker_session = sagemaker.session.Session()\n", + "sagemaker_client = sagemaker_session.sagemaker_client\n", + "sagemaker_runtime_client = sagemaker_session.sagemaker_runtime_client\n", + "role = get_execution_role()\n", + "\n", + "DATA_BUCKET = 'adp-rnd-ml-datasets'\n", + "# Change the following bucket names if you want to run this code outside GridDynamics to the ones you have accees to.\n", + "STAGE_BUCKET = 'adp-rnd-ml-stage'\n", + "MODEL_BUCKET = 'adp-rnd-ml-models'\n", + "MODEL_PATH = f\"s3://{MODEL_BUCKET}/xgboost-for-loan-default-data/output/xgboost-for-loan-default-data-2023-02-24-08-18-30-629/output/model.tar.gz\"\n", + "DATASET_PATH = f\"s3://{DATA_BUCKET}/loan_default/dataset.zip\"\n", + "TEST_PATH = f\"s3://{DATA_BUCKET}/loan_default/test.csv\"\n", + "VALIDATION_PATH = f\"s3://{DATA_BUCKET}/loan_default/eval.csv\"\n", + "\n", + "NOW = datetime.now()\n", + "\n", + "%matplotlib inline\n", + "plt.style.use(\"seaborn-pastel\")" + ] + }, + { + "cell_type": "markdown", + "id": "f06b3c0a-117f-415f-b30b-a52938c0c623", + "metadata": {}, + "source": [ + "## 2. Dataset" + ] + }, + { + "cell_type": "markdown", + "id": "6414debb-84e5-4f6a-9914-fe83ad09300b", + "metadata": {}, + "source": [ + "Loan Default Dataset was used for this work. It's published under CC0: Public Domain license, hence both non-profit, and commercial usage is permitted. It consists of 33 features of different types (categorical/binary, continuous/decimal) including the binary target feature **Status** which is used for the classification task. The dataset has set of columns with missing values, which needs to be handled before using with the machine learning models. It has outliers, which needs to be handled, it is likely to have data leakege, which solves the task for 100% accuracy, so such features are to remove." + ] + }, + { + "cell_type": "markdown", + "id": "6c5acb59-42b8-49d9-986d-1790c3fe0c9d", + "metadata": {}, + "source": [ + "### Basic EDA" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "957b1e1e-004a-40fb-b96c-300e12153d00", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Int64Index: 148670 entries, 24890 to 173559\n", + "Data columns (total 33 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 year 148670 non-null int64 \n", + " 1 loan_limit 145326 non-null object \n", + " 2 Gender 148670 non-null object \n", + " 3 approv_in_adv 147762 non-null object \n", + " 4 loan_type 148670 non-null object \n", + " 5 loan_purpose 148536 non-null object \n", + " 6 Credit_Worthiness 148670 non-null object \n", + " 7 open_credit 148670 non-null object \n", + " 8 business_or_commercial 148670 non-null object \n", + " 9 loan_amount 148670 non-null int64 \n", + " 10 rate_of_interest 112231 non-null float64\n", + " 11 Interest_rate_spread 112031 non-null float64\n", + " 12 Upfront_charges 109028 non-null float64\n", + " 13 term 148629 non-null float64\n", + " 14 Neg_ammortization 148549 non-null object \n", + " 15 interest_only 148670 non-null object \n", + " 16 lump_sum_payment 148670 non-null object \n", + " 17 property_value 133572 non-null float64\n", + " 18 construction_type 148670 non-null object \n", + " 19 occupancy_type 148670 non-null object \n", + " 20 Secured_by 148670 non-null object \n", + " 21 total_units 148670 non-null object \n", + " 22 income 139520 non-null float64\n", + " 23 credit_type 148670 non-null object \n", + " 24 Credit_Score 148670 non-null int64 \n", + " 25 co-applicant_credit_type 148670 non-null object \n", + " 26 age 148470 non-null object \n", + " 27 submission_of_application 148470 non-null object \n", + " 28 LTV 133572 non-null float64\n", + " 29 Region 148670 non-null object \n", + " 30 Security_Type 148670 non-null object \n", + " 31 Status 148670 non-null int64 \n", + " 32 dtir1 124549 non-null float64\n", + "dtypes: float64(8), int64(4), object(21)\n", + "memory usage: 38.6+ MB\n" + ] + } + ], + "source": [ + "dataset = pd.read_csv(DATASET_PATH, compression='zip', index_col='ID')\n", + "dataset.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "f770040e-9539-4709-b744-72b522d2dc73", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
yearloan_limitGenderapprov_in_advloan_typeloan_purposeCredit_Worthinessopen_creditbusiness_or_commercialloan_amount...credit_typeCredit_Scoreco-applicant_credit_typeagesubmission_of_applicationLTVRegionSecurity_TypeStatusdtir1
ID
248902019cfSex Not Availablenopretype1p1l1nopcnob/c116500...EXP758CIB25-34to_inst98.728814southdirect145.0
248912019cfMalenopretype2p1l1nopcb/c206500...EQUI552EXP55-64to_instNaNNorthdirect1NaN
248922019cfMalepretype1p1l1nopcnob/c406500...EXP834CIB35-44to_inst80.019685southdirect046.0
248932019cfMalenopretype1p4l1nopcnob/c456500...EXP587CIB45-54not_inst69.376900Northdirect042.0
248942019cfJointpretype1p1l1nopcnob/c696500...CRIF602EXP25-34not_inst91.886544Northdirect039.0
\n", + "

5 rows × 33 columns

\n", + "
" + ], + "text/plain": [ + " year loan_limit Gender approv_in_adv loan_type \\\n", + "ID \n", + "24890 2019 cf Sex Not Available nopre type1 \n", + "24891 2019 cf Male nopre type2 \n", + "24892 2019 cf Male pre type1 \n", + "24893 2019 cf Male nopre type1 \n", + "24894 2019 cf Joint pre type1 \n", + "\n", + " loan_purpose Credit_Worthiness open_credit business_or_commercial \\\n", + "ID \n", + "24890 p1 l1 nopc nob/c \n", + "24891 p1 l1 nopc b/c \n", + "24892 p1 l1 nopc nob/c \n", + "24893 p4 l1 nopc nob/c \n", + "24894 p1 l1 nopc nob/c \n", + "\n", + " loan_amount ... credit_type Credit_Score co-applicant_credit_type \\\n", + "ID ... \n", + "24890 116500 ... EXP 758 CIB \n", + "24891 206500 ... EQUI 552 EXP \n", + "24892 406500 ... EXP 834 CIB \n", + "24893 456500 ... EXP 587 CIB \n", + "24894 696500 ... CRIF 602 EXP \n", + "\n", + " age submission_of_application LTV Region Security_Type \\\n", + "ID \n", + "24890 25-34 to_inst 98.728814 south direct \n", + "24891 55-64 to_inst NaN North direct \n", + "24892 35-44 to_inst 80.019685 south direct \n", + "24893 45-54 not_inst 69.376900 North direct \n", + "24894 25-34 not_inst 91.886544 North direct \n", + "\n", + " Status dtir1 \n", + "ID \n", + "24890 1 45.0 \n", + "24891 1 NaN \n", + "24892 0 46.0 \n", + "24893 0 42.0 \n", + "24894 0 39.0 \n", + "\n", + "[5 rows x 33 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Few instances of dataset\n", + "dataset.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "1e542bba-9c2e-4df8-aaad-0ae5d8c085dd", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
yearloan_amountrate_of_interestInterest_rate_spreadUpfront_chargestermproperty_valueincomeCredit_ScoreLTVStatusdtir1
count148670.01.486700e+05112231.000000112031.000000109028.000000148629.0000001.335720e+05139520.000000148670.000000133572.000000148670.000000124549.000000
mean2019.03.311177e+054.0454760.4416563224.996127335.1365824.978935e+056957.338876699.78910372.7464570.24644537.732932
std0.01.839093e+050.5613910.5130433251.12151058.4090843.599353e+056496.586382115.87585739.9676030.43094210.545435
min2019.01.650000e+040.000000-3.6380000.00000096.0000008.000000e+030.000000500.0000000.9674780.0000005.000000
25%2019.01.965000e+053.6250000.076000581.490000360.0000002.680000e+053720.000000599.00000060.4748600.00000031.000000
50%2019.02.965000e+053.9900000.3904002596.450000360.0000004.180000e+055760.000000699.00000075.1358700.00000039.000000
75%2019.04.365000e+054.3750000.7754004812.500000360.0000006.280000e+058520.000000800.00000086.1842110.00000045.000000
max2019.03.576500e+068.0000003.35700060000.000000360.0000001.650800e+07578580.000000900.0000007831.2500001.00000061.000000
\n", + "
" + ], + "text/plain": [ + " year loan_amount rate_of_interest Interest_rate_spread \\\n", + "count 148670.0 1.486700e+05 112231.000000 112031.000000 \n", + "mean 2019.0 3.311177e+05 4.045476 0.441656 \n", + "std 0.0 1.839093e+05 0.561391 0.513043 \n", + "min 2019.0 1.650000e+04 0.000000 -3.638000 \n", + "25% 2019.0 1.965000e+05 3.625000 0.076000 \n", + "50% 2019.0 2.965000e+05 3.990000 0.390400 \n", + "75% 2019.0 4.365000e+05 4.375000 0.775400 \n", + "max 2019.0 3.576500e+06 8.000000 3.357000 \n", + "\n", + " Upfront_charges term property_value income \\\n", + "count 109028.000000 148629.000000 1.335720e+05 139520.000000 \n", + "mean 3224.996127 335.136582 4.978935e+05 6957.338876 \n", + "std 3251.121510 58.409084 3.599353e+05 6496.586382 \n", + "min 0.000000 96.000000 8.000000e+03 0.000000 \n", + "25% 581.490000 360.000000 2.680000e+05 3720.000000 \n", + "50% 2596.450000 360.000000 4.180000e+05 5760.000000 \n", + "75% 4812.500000 360.000000 6.280000e+05 8520.000000 \n", + "max 60000.000000 360.000000 1.650800e+07 578580.000000 \n", + "\n", + " Credit_Score LTV Status dtir1 \n", + "count 148670.000000 133572.000000 148670.000000 124549.000000 \n", + "mean 699.789103 72.746457 0.246445 37.732932 \n", + "std 115.875857 39.967603 0.430942 10.545435 \n", + "min 500.000000 0.967478 0.000000 5.000000 \n", + "25% 599.000000 60.474860 0.000000 31.000000 \n", + "50% 699.000000 75.135870 0.000000 39.000000 \n", + "75% 800.000000 86.184211 0.000000 45.000000 \n", + "max 900.000000 7831.250000 1.000000 61.000000 " + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Shows statistics of numerical features\n", + "dataset.describe() " + ] + }, + { + "cell_type": "markdown", + "id": "f2b8a2dd-5ddd-4a76-b6a1-1f1e9c0e21a3", + "metadata": {}, + "source": [ + "Here one can get intuition of sensitive features distribution before preprocessing (handling missing values, and one-hot encoding):" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "2e2da775-d508-4583-8212-5884bd0193e5", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "SENSITIVE_COLUMNS = ['Gender', 'occupancy_type', 'age'] # + \"income\", but processed separately, as different type\n", + "\n", + "figs, axs = plt.subplots(4, 1, figsize=(9, 12))\n", + "for i, column in enumerate(SENSITIVE_COLUMNS):\n", + " data = dataset[column].value_counts()\n", + " axs[i].barh(data.index, data.values)\n", + " axs[i].set_title(column, pad=3)\n", + "\n", + "axs[-1].hist(dataset['income'], bins=100, log=True)\n", + "axs[-1].set_title('income', pad=3)\n", + "plt.tight_layout()" + ] + }, + { + "cell_type": "markdown", + "id": "6d2b35a3-6e7d-4850-b091-9a1c006a18ce", + "metadata": {}, + "source": [ + "## 3. Deploying model" + ] + }, + { + "cell_type": "markdown", + "id": "99413c71-e1e1-4a76-ae4b-582a25c0344c", + "metadata": {}, + "source": [ + "Setting up data capture config to be able to monitor model bias based on the stored data" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "7176b64d-d8cc-4c80-83fa-977a9c9b0c83", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "S3 key: s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28\n" + ] + } + ], + "source": [ + "container = sagemaker.image_uris.retrieve(\"xgboost\", sagemaker_session.boto_region_name, \"1.5-1\")\n", + "\n", + "prefix = f'xgboost-for-loan-default-data-{NOW:%Y-%m-%d}'\n", + "s3_key = f\"s3://{STAGE_BUCKET}/{prefix}\"\n", + "print(f\"S3 key: {s3_key}\")\n", + "s3_capture_upload_path = f\"{s3_key}/datacapture\"\n", + "\n", + "data_capture_config = DataCaptureConfig(\n", + " enable_capture=True,\n", + " sampling_percentage=100,\n", + " destination_s3_uri=s3_capture_upload_path,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "e509f304-701d-435e-9062-86564f893d86", + "metadata": {}, + "source": [ + "Creating and deploying Model instance fox XGBoost classifier model which then will be used for ML Observability" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "aa52d932-3ec3-479b-8c62-24eadb74945b", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "-------!" + ] + } + ], + "source": [ + "xgb_model_name = f'loan-default-xgboost-{NOW:%Y-%m-%d}'\n", + "xgb_endpoint_name = f'{xgb_model_name}-endpoint'\n", + "\n", + "xgb_model = Model(\n", + " image_uri=container,\n", + " model_data=MODEL_PATH,\n", + " role=role,\n", + " name=xgb_model_name,\n", + " sagemaker_session=sagemaker_session\n", + ")\n", + "\n", + "xgb_model.deploy(\n", + " endpoint_name=xgb_endpoint_name,\n", + " model_name=xgb_model_name,\n", + " initial_instance_count=1, \n", + " instance_type=\"ml.m4.xlarge\",\n", + " data_capture_config=data_capture_config,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "fef6ee74-b0fe-4b4b-a463-06eacd2c30dd", + "metadata": {}, + "source": [ + "### Results" + ] + }, + { + "cell_type": "markdown", + "id": "9bbec4fa-af38-4985-ac47-d35f9798e850", + "metadata": { + "tags": [] + }, + "source": [ + "As a result, one will get a Load Default model and an endpoint. You can find them in the AWS Console -> SageMaker -> Governance -> Model cards. The new model should appear with specified endpoint (and that's all for now)\n", + "\n", + "\"governance.png\"\n", + "\n", + "\\* here and later all screenshots are given for illustration. You can have different names/values" + ] + }, + { + "cell_type": "markdown", + "id": "5bcab79a-8deb-4734-8838-37ce22bccc53", + "metadata": {}, + "source": [ + "## 4. Setting up monitoring job" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "653cb277-1a13-47c0-9030-b390dc52c6be", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Capture path: s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/datacapture\n", + "Ground truth path: s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/ground_truth_data\n", + "Report path: s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/reports\n", + "Baseline results uri: s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/baselining\n" + ] + } + ], + "source": [ + "monitoring_prefix = f\"{prefix}/ClarifyModelMonitor-{NOW:%Y-%m-%d}\"\n", + "\n", + "ground_truth_upload_path = f\"{s3_key}/ground_truth_data\"\n", + "s3_report_path = f\"{s3_key}/reports\"\n", + "\n", + "print(f\"Capture path: {s3_capture_upload_path}\")\n", + "print(f\"Ground truth path: {ground_truth_upload_path}\")\n", + "print(f\"Report path: {s3_report_path}\")\n", + "\n", + "baseline_results_uri = f\"{s3_key}/baselining\"\n", + "print(f\"Baseline results uri: {baseline_results_uri}\")\n", + "model_bias_baselining_job_result_uri = f\"{baseline_results_uri}/model_bias\"\n" + ] + }, + { + "cell_type": "markdown", + "id": "d1a9ee97-66cb-44a2-82eb-6ba9fb258759", + "metadata": {}, + "source": [ + "After preprocessing some columns were dropped, some were transformed into numerical by converting binary columns to 0/1 columns, and some were one-hot encoded:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "9c456e85-3d8c-4e6d-b740-e4bf7177f117", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "TARGET_COLUMN = 'Status'\n", + "ALL_INPUT_COLUMNS = ['Status', 'loan_amount', 'term', 'income', 'Credit_Score', 'loan_limit',\n", + " 'approv_in_adv', 'Credit_Worthiness', 'open_credit',\n", + " 'business_or_commercial', 'Neg_ammortization', 'interest_only',\n", + " 'lump_sum_payment', 'Secured_by', 'co-applicant_credit_type',\n", + " 'submission_of_application', 'Gender_Female', 'Gender_Joint',\n", + " 'Gender_Male', 'Gender_Sex Not Available', 'loan_purpose_p1',\n", + " 'loan_purpose_p2', 'loan_purpose_p3', 'loan_purpose_p4',\n", + " 'occupancy_type_ir', 'occupancy_type_pr', 'occupancy_type_sr',\n", + " 'total_units_1U', 'total_units_2U', 'total_units_3U', 'total_units_4U',\n", + " 'age_25-34', 'age_35-44', 'age_45-54', 'age_55-64', 'age_65-74',\n", + " 'age_gt74', 'age_lt25']\n", + "DATASET_TYPE = \"text/csv\"" + ] + }, + { + "cell_type": "markdown", + "id": "a92fd3c2-48e9-4c2f-9906-854d1875a480", + "metadata": {}, + "source": [ + "Creating and configuring monitor:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "1978e0a4-2324-4ee0-901d-4a3fcf5591dd", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "model_bias_monitor = ModelBiasMonitor(\n", + " role=role,\n", + " sagemaker_session=sagemaker_session,\n", + " max_runtime_in_seconds=1800,\n", + ")\n", + "\n", + "model_bias_data_config = DataConfig(\n", + " s3_data_input_path=VALIDATION_PATH,\n", + " s3_output_path=model_bias_baselining_job_result_uri,\n", + " label=TARGET_COLUMN,\n", + " headers=ALL_INPUT_COLUMNS,\n", + " dataset_type=DATASET_TYPE,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "eb2a0667-9f49-4254-9d99-c3dfa299239f", + "metadata": {}, + "source": [ + "Setting up facets - sensitive features based on which model bias will be calculated and later observed" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "2398faa8-73c9-411f-add7-9eea111978bd", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "FACET_COLUMNS_VALUES = {\n", + " facet: values \n", + " for facet, values\n", + " in zip(\n", + " [\n", + " 'Gender_Female', 'Gender_Joint', 'Gender_Male', 'Gender_Sex Not Available', \n", + " 'income',\n", + " 'occupancy_type_ir', 'occupancy_type_pr', 'occupancy_type_sr',\n", + " 'age_25-34', 'age_35-44', 'age_45-54', 'age_55-64', 'age_65-74', 'age_gt74', 'age_lt25'\n", + " ],\n", + " [\n", + " [1], [1], [1], [1], # Gender - Binary data, assessing only positive values of one-hot-encoded features\n", + " [25000.], # income - threshold splitting into 2 groups with income level less and more then 30000.0 \n", + " [1], [1], [1], # occupancy_pyte - Binary data, assessing only positive values of one-hot-encoded features \n", + " [1], [1], [1], [1], [1], [1], [1] # age - Binary data, assessing only positive values of one-hot-encoded features\n", + " ]\n", + " )\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "3740e90b-cb2a-4b12-912c-c4f1b2ab7a99", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "model_bias_config = BiasConfig(\n", + " # Loan default value of target feature Status is 1. \n", + " # Thus, we'll choose 0 as positive outcome, so Bias metrics could make sense\n", + " label_values_or_threshold=[0],\n", + " facet_name=list(FACET_COLUMNS_VALUES.keys()),\n", + " facet_values_or_threshold=list(FACET_COLUMNS_VALUES.values()),\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "35d4de0a-0df7-47d3-8197-f5693f7cecef", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "model_predicted_label_config = ModelPredictedLabelConfig(probability_threshold=.5)\n", + "model_config = ModelConfig(\n", + " instance_count=1,\n", + " instance_type=\"ml.m4.xlarge\",\n", + " content_type=DATASET_TYPE,\n", + " accept_type=DATASET_TYPE,\n", + " endpoint_name=xgb_endpoint_name,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "3738f518-c0d6-4515-ba22-3e52a8bcba25", + "metadata": {}, + "source": [ + "### Creating baseline" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "222ad0c1-ef20-479c-8bf2-038ddea23425", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:sagemaker:Creating processing-job with name baseline-suggestion-job-2023-02-28-15-46-10-999\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + ".............................................................................!\n" + ] + } + ], + "source": [ + "model_bias_monitor.suggest_baseline(\n", + " model_config=model_config,\n", + " data_config=model_bias_data_config,\n", + " bias_config=model_bias_config,\n", + " model_predicted_label_config=model_predicted_label_config,\n", + ")\n", + "\n", + "model_bias_monitor.latest_baselining_job.wait(logs=False)\n", + "model_bias_constraints = model_bias_monitor.suggested_constraints()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "6a70d948-f54a-4643-9495-564c7f80e28b", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ModelBiasMonitor suggested constraints: s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/baselining/model_bias/analysis.json\n", + "Visualizing part of suggested constrains:\n", + "{\n", + " \"version\": \"1.0\",\n", + " \"post_training_bias_metrics\": {\n", + " \"label\": \"Status\",\n", + " \"facets\": {\n", + " \"Gender_Female\": [\n", + " {\n", + " \"value_or_threshold\": \"1\",\n", + " \"metrics\": [\n", + " {\n", + " \"name\": \"AD\",\n", + " \"description\": \"Accuracy Difference (AD)\",\n", + " \"value\": -0.012274862880233273\n", + " },\n", + " {\n", + " \"name\": \"CDDPL\",\n", + " \"description\": \"Conditional Demographic Disparity in Predicted Labels (CDDPL)\",\n", + " \"value\": null,\n", + " \"error\": \"Group variable is empty or not provided\"\n", + " },\n", + " {\n", + " \"name\": \"DAR\",\n", + " \"description\": \"Difference in Acceptance Rates (DAR)\",\n", + " \"value\": -0.019687953899706012\n", + " },\n", + " {\n", + " \"name\": \"DCA\",\n", + " \"description\": \"Difference in Conditional Acceptance (DCA)\",\n", + " \"value\": -0.035234696157777856\n", + " },\n", + " {\n", + " \"name\": \"DCR\",\n", + " \"description\": \"Difference in Conditional Rejection (DCR)\",\n", + " \"value\": -0.5736707206241256\n", + " },\n", + " {\n", + " \"name\": \"DI\",\n", + " \"description\": \"Disparate Impact (DI)\",\n", + " \"value\": 0.9544013145312146\n", + " },\n", + " {\n", + " \"name\": \"DPPL\",\n", + " \"description\": \"Difference in Positive Proportions in Predicted Labels (DPPL)\",\n", + " \"value\": 0.040580765079404446\n", + " },\n", + " {\n", + " \"name\": \"DRR\",\n", + " \"description\": \"Difference in Rejection Rates (DRR)\",\n", + " \"value\": -0.010994514578744008\n", + " },\n", + " {\n", + " \"name\": \"FT\",\n", + " \"description\": \"Flip Test (FT)\",\n", + " \"value\": -0.11717435993272285\n", + " },\n", + " {\n", + " \"name\": \"GE\",\n", + " \"description\": \"Generalized Entropy (GE)\",\n", + " \"value\": 0.06776540220552933\n", + " },\n", + " {\n", + " \"name\": \"RD\",\n", + " \"description\": \"Recall Difference (RD)\",\n", + " \"value\": 0.01614804633418354\n", + " },\n", + " {\n", + " \"name\": \"SD\",\n", + " \"description\": \"Specificity Difference (SD)\",\n", + " \"value\": 0.10832164460419041\n", + " },\n", + " {\n", + " \"name\": \"TE\",\n", + " \"description\": \"Treatment Equality (TE)\",\n", + " \"value\": 0.11461989439613904\n", + " }\n", + " ]\n", + " }\n", + " ],\n", + " \"income\": \"...\"\n", + " }\n", + " },\n", + " \"pre_training_bias_metrics\": {\n", + " \"label\": \"Status\",\n", + " \"facets\": {\n", + " \"Gender_Female\": [\n", + " {\n", + " \"value_or_threshold\": \"1\",\n", + " \"metrics\": [\n", + " {\n", + " \"name\": \"CDDL\",\n", + " \"description\": \"Conditional Demographic Disparity in Labels (CDDL)\",\n", + " \"value\": null,\n", + " \"error\": \"Group variable is empty or not provided\"\n", + " },\n", + " {\n", + " \"name\": \"CI\",\n", + " \"description\": \"Class Imbalance (CI)\",\n", + " \"value\": 0.6389460544516042\n", + " },\n", + " {\n", + " \"name\": \"DPL\",\n", + " \"description\": \"Difference in Positive Proportions in Labels (DPL)\",\n", + " \"value\": 0.004437718747468233\n", + " },\n", + " {\n", + " \"name\": \"JS\",\n", + " \"description\": \"Jensen-Shannon Divergence (JS)\",\n", + " \"value\": 1.3179244474955353e-05\n", + " },\n", + " {\n", + " \"name\": \"KL\",\n", + " \"description\": \"Kullback-Liebler Divergence (KL)\",\n", + " \"value\": 5.261255142023462e-05\n", + " },\n", + " {\n", + " \"name\": \"KS\",\n", + " \"description\": \"Kolmogorov-Smirnov Distance (KS)\",\n", + " \"value\": 0.004437718747468261\n", + " },\n", + " {\n", + " \"name\": \"LP\",\n", + " \"description\": \"L-p Norm (LP)\",\n", + " \"value\": 0.006275882038666939\n", + " },\n", + " {\n", + " \"name\": \"TVD\",\n", + " \"description\": \"Total Variation Distance (TVD)\",\n", + " \"value\": 0.004437718747468247\n", + " }\n", + " ]\n", + " }\n", + " ],\n", + " \"income\": \"...\"\n", + " }\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "print(f\"ModelBiasMonitor suggested constraints: {model_bias_constraints.file_s3_uri}\")\n", + "print(\"Visualizing part of suggested constrains:\")\n", + "suggested_constrains = json.loads(S3Downloader.read_file(model_bias_constraints.file_s3_uri))\n", + "res = {k: v for k, v in suggested_constrains.items() if 'metrics' not in k}\n", + "res['post_training_bias_metrics'] = dict(label='Status', facets=dict(Gender_Female=suggested_constrains['post_training_bias_metrics']['facets']['Gender_Female']))\n", + "res['pre_training_bias_metrics'] = dict(label='Status', facets=dict(Gender_Female=suggested_constrains['pre_training_bias_metrics']['facets']['Gender_Female']))\n", + "res['post_training_bias_metrics']['facets']['income'] = '...'\n", + "res['pre_training_bias_metrics']['facets']['income'] = '...'\n", + "print(json.dumps(res, indent=4))" + ] + }, + { + "cell_type": "markdown", + "id": "67cc455b-fed0-493f-8ce2-fa78aa8bd7c4", + "metadata": {}, + "source": [ + "### Setting up monitoring job" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "9444f5ad-4a5a-4ea5-8677-32930a1cf101", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:sagemaker.model_monitor.clarify_model_monitoring:Uploading analysis config to {s3_uri}.\n", + "INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: monitoring-schedule-2023-02-28-15-53-44-998\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model bias monitoring schedule: monitoring-schedule-2023-02-28-15-53-44-998\n" + ] + } + ], + "source": [ + "model_bias_analysis_config = None\n", + "if not model_bias_monitor.latest_baselining_job:\n", + " model_bias_analysis_config = BiasAnalysisConfig(\n", + " model_bias_config,\n", + " headers=all_headers,\n", + " label=label_header,\n", + " )\n", + "\n", + "model_bias_monitor.create_monitoring_schedule(\n", + " analysis_config=model_bias_analysis_config,\n", + " output_s3_uri=s3_report_path,\n", + " endpoint_input=EndpointInput(\n", + " endpoint_name=xgb_endpoint_name,\n", + " destination=\"/opt/ml/processing/input/endpoint\",\n", + " start_time_offset=\"-PT1H\",\n", + " end_time_offset=\"-PT0H\",\n", + " probability_threshold_attribute=0.5,\n", + " ),\n", + " ground_truth_input=ground_truth_upload_path,\n", + " schedule_cron_expression=CronExpressionGenerator.hourly(),\n", + ")\n", + "print(f\"Model bias monitoring schedule: {model_bias_monitor.monitoring_schedule_name}\")" + ] + }, + { + "cell_type": "markdown", + "id": "c01ea526-ef99-4adb-9b97-2ee3a3a1d90e", + "metadata": {}, + "source": [ + "### Results" + ] + }, + { + "cell_type": "markdown", + "id": "54f4f8d1-5466-434e-9bee-2dc7600c73b2", + "metadata": {}, + "source": [ + "Model monitoring job will be created. It might be found in the model dashboard if we go to the card or in the SageMaker Studio:\n", + "SageMaker Studio -> Home -> Deployments -> Endpoint -> _Model name_ -> Model bias\n", + "\n", + "\n", + "![studio_model](images/studio_model.jpg)\n", + "\n", + "\n", + "\"monitoring_job\"\n", + "\n", + "After some time, when several reports will be generated (after the 5th chapter) one will be able to see bias metrics charts below" + ] + }, + { + "cell_type": "markdown", + "id": "19261039-b5df-4a45-8f6d-7d6d5f3c8c87", + "metadata": {}, + "source": [ + "## 5. Generating traffic" + ] + }, + { + "cell_type": "markdown", + "id": "0bf5d84b-4fcb-49bc-be27-155944a51cea", + "metadata": {}, + "source": [ + "Preparing data for checking for model bias. It can be any data supported by model." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "1a7ed27b-94d2-4f09-8f84-099d3549032e", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "test_df = pd.read_csv(TEST_PATH)\n", + "for col in FACET_COLUMNS_VALUES:\n", + " if col == 'income':\n", + " continue\n", + " test_df[col] = test_df[col].astype(int)\n", + "\n", + "gender_aspects = [col for col in FACET_COLUMNS_VALUES if 'Gender' in col]\n", + "descr_colnames = ['metric_kind', 'good', 'min', 'max', 'description',]" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "a3921c14-32da-4b89-871b-480d916e390a", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Serialization function which will make Clarity understand types of attributes\n", + "def cast_str(x):\n", + " if x in [1.0, 1, 0, 0.0]:\n", + " return(str(int(x)))\n", + " return str(x)\n", + "\n", + "def serialize_example(example):\n", + " return ','.join(map(cast_str, example))\n", + "\n", + "# For reproducibility\n", + "np.random.seed(42)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "be41b804-1bfc-4c73-b356-089f7257b55d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import threading\n", + "\n", + "class WorkerThread(threading.Thread):\n", + " def __init__(self, do_run, *args, **kwargs):\n", + " super(WorkerThread, self).__init__(*args, **kwargs)\n", + " self.__do_run = do_run\n", + " self.__terminate_event = threading.Event()\n", + "\n", + " def terminate(self):\n", + " self.__terminate_event.set()\n", + "\n", + " def run(self):\n", + " while not self.__terminate_event.is_set():\n", + " self.__do_run(self.__terminate_event)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "f0d9f9f1-3d80-483b-b33f-dcb1197de03b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "def invoke(terminate_event):\n", + " # Getting predictions from deployed model for test dataframe\n", + " for i, row in test_df.drop(columns=[TARGET_COLUMN]).iterrows():\n", + " payload = serialize_example(row)\n", + " response = sagemaker_runtime_client.invoke_endpoint(\n", + " EndpointName=xgb_endpoint_name,\n", + " Body=payload,\n", + " ContentType=DATASET_TYPE,\n", + " InferenceId=str(i)\n", + " )\n", + " prediction = response[\"Body\"].read()\n", + " time.sleep(0.01)\n", + " if terminate_event.is_set():\n", + " break\n", + "\n", + "def real_ground_truth_with_id(inference_id):\n", + " # Ground truth requires following format:\n", + " return {\n", + " \"groundTruthData\": {\n", + " \"data\": str(int(test_df.iloc[inference_id][TARGET_COLUMN])),\n", + " \"encoding\": \"CSV\",\n", + " },\n", + " \"eventMetadata\": {\n", + " \"eventId\": str(inference_id),\n", + " },\n", + " \"eventVersion\": \"0\",\n", + " }\n", + "\n", + "\n", + "def upload_ground_truth(upload_time):\n", + " records = [real_ground_truth_with_id(i) for i in range(len(test_df))]\n", + " records = [json.dumps(r) for r in records]\n", + " data_to_upload = \"\\n\".join(records)\n", + " target_s3_uri = f\"{ground_truth_upload_path}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl\"\n", + " S3Uploader.upload_string_as_file_body(data_to_upload, target_s3_uri)\n", + "\n", + "def generate_ground_truth(terminate_event):\n", + " invoke(terminate_event)\n", + " upload_ground_truth(datetime.utcnow())\n", + " for _ in range(0, 60):\n", + " time.sleep(60)\n", + " if terminate_event.is_set():\n", + " break" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "28ba35fd-2c42-4799-8664-a671190dc0fd", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "ground_truth_thread = WorkerThread(do_run=generate_ground_truth)\n", + "ground_truth_thread.start()" + ] + }, + { + "cell_type": "markdown", + "id": "b7d0e8d9-f546-4a67-b67d-bc6305aa338a", + "metadata": {}, + "source": [ + "### Results" + ] + }, + { + "cell_type": "markdown", + "id": "544a1b16-7f68-4ba7-9264-a4a238880b6c", + "metadata": {}, + "source": [ + "After generating some trafic we need to wait until it is processed, so we could visualize the results" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "f3ab7723-75a3-42ee-a32b-22b084e164f5", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "def wait_for_execution_to_start(model_monitor):\n", + " print(\n", + " \"A hourly schedule will kick off executions ON the hour (plus 0 - 20 min buffer).\"\n", + " )\n", + "\n", + " print(\"Waiting for the first execution to happen\", end=\"\")\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " while \"LastMonitoringExecutionSummary\" not in schedule_desc:\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(60)\n", + " print()\n", + " print(\"Done! Execution has been created\")\n", + "\n", + " print(\"Now waiting for execution to start\", end=\"\")\n", + " while schedule_desc[\"LastMonitoringExecutionSummary\"][\"MonitoringExecutionStatus\"] in \"Pending\":\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(10)\n", + "\n", + " print()\n", + " print(\"Done! Execution has started\")" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "aca68c10-401d-42f9-9495-56dcc859fbc8", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Waits for the schedule to have last execution in a terminal status.\n", + "def wait_for_execution_to_finish(model_monitor):\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " execution_summary = schedule_desc.get(\"LastMonitoringExecutionSummary\")\n", + " if execution_summary is not None:\n", + " print(\"Waiting for execution to finish\", end=\"\")\n", + " while execution_summary[\"MonitoringExecutionStatus\"] not in [\n", + " \"Completed\",\n", + " \"CompletedWithViolations\",\n", + " \"Failed\",\n", + " \"Stopped\",\n", + " ]:\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(60)\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " execution_summary = schedule_desc[\"LastMonitoringExecutionSummary\"]\n", + " print()\n", + " print(\"Done! Execution has finished\")\n", + " else:\n", + " print(\"Last execution not found\")\n", + " return schedule_desc" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "490a911a-0d69-4d63-b3eb-7e2e9cf0e0ed", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "A hourly schedule will kick off executions ON the hour (plus 0 - 20 min buffer).\n", + "Waiting for the first execution to happen........\n", + "Done! Execution has been created\n", + "Now waiting for execution to start\n", + "Done! Execution has started\n", + "Waiting for execution to finish...........\n", + "Done! Execution has finished\n" + ] + } + ], + "source": [ + "wait_for_execution_to_start(model_bias_monitor)\n", + "schedule_desc = wait_for_execution_to_finish(model_bias_monitor)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c7ceb6d1-2984-4942-8826-eed29041bf2e", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "metrix_values = pd.DataFrame.from_dict({\n", + " # Post\n", + " 'AD': [0, -1, 1, 'Accuracy Difference (AD)'],\n", + " 'DAR': [0, -1, 1, 'Difference in Acceptance Rates (DAR)'],\n", + " 'DCA': [0, -np.Inf, np.Inf, 'Difference in Conditional Acceptance (DCA)'],\n", + " 'DCR': [0, -np.Inf, np.Inf, 'Difference in Conditional Rejection (DCR)'],\n", + " 'DI': [1, 0, np.Inf, 'Disparate Impact (DI)'],\n", + " 'DPPL': [0, -1, 1, 'Difference in Positive Proportions in Predictected Labels (DPPL)'],\n", + " 'DRR': [0, -1, 1, 'Difference in Rejection Rates (DRR)'], \n", + " 'FT': [0, -1, 1, 'Flip Test (FT)'],\n", + " 'GE': [0, 0, .5, 'Generalized Entropy (GE)'],\n", + " 'RD': [0, -1, 1, 'Recall Difference (RD)'],\n", + " 'SD': [0, -1, 1, 'Specificity Difference (SD)'],\n", + " 'TE': [0, -np.Inf, np.Inf, 'Treatment Equality (TE)'],\n", + " # Pre\n", + " 'CI': [0, -1, 1, 'Class Imbalance (CI)'],\n", + " 'DPL': [0, -1, 1, 'Difference in Positive Proportions in Labels (DPL)'],\n", + " 'JS': [0, 0, np.Inf, 'Jensen-Shannon Divergence (JS)'],\n", + " 'KL': [0, 0, np.Inf, 'Kullback-Liebler Divergence (KL)'],\n", + " 'KS': [0, 0, 1, 'Kolmogorov-Smirnov Distance (KS)'],\n", + " 'LP': [0, 0, np.Inf, 'L-p Norm (LP)'],\n", + " 'TVD': [0, 0, np.Inf, 'Total Variation Distance (TVD)'],\n", + "}, orient='index', columns=['good', 'min', 'max', 'description'])\n", + "\n", + "def extract_metrix_from_analysis(analysis_dict):\n", + " post_training = analysis_dict['post_training_bias_metrics']\n", + " pre_training = analysis_dict['pre_training_bias_metrics']\n", + " facets = post_training['facets'].keys()\n", + " frames = dict()\n", + " \n", + " for metric_kind in ['post_training_bias_metrics', 'pre_training_bias_metrics']:\n", + " metric = analysis_dict[metric_kind]\n", + " frames[metric_kind] = dict()\n", + " for facet in facets:\n", + " frames[metric_kind][facet] = []\n", + " for facet_analysis in metric['facets'][facet]:\n", + " frames[metric_kind][facet].append(\n", + " pd.DataFrame(facet_analysis['metrics'])\n", + " .drop(columns=['error', 'description'])\n", + " .set_index('name')\n", + " # as all features are binary:\n", + " .rename(columns={'value': facet})\n", + " )\n", + " frames[metric_kind][facet] = pd.concat(frames[metric_kind][facet], axis=1)\n", + " frames[metric_kind] = pd.concat(frames[metric_kind].values(), axis=1)\n", + " frames[metric_kind]['metric_kind'] = metric_kind\n", + " return pd.concat(frames.values()).join(metrix_values, how='inner')" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "305c2eae-6df8-48a0-8317-755e85ceee46", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Report URI: s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/reports/loan-default-xgboost-2023-02-28-endpoint/monitoring-schedule-2023-02-28-15-53-44-998/2023/02/28/16\n", + "Found Report Files:\n", + "s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/reports/loan-default-xgboost-2023-02-28-endpoint/monitoring-schedule-2023-02-28-15-53-44-998/2023/02/28/16/analysis.json\n", + " s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/reports/loan-default-xgboost-2023-02-28-endpoint/monitoring-schedule-2023-02-28-15-53-44-998/2023/02/28/16/constraint_violations.json\n", + " s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/reports/loan-default-xgboost-2023-02-28-endpoint/monitoring-schedule-2023-02-28-15-53-44-998/2023/02/28/16/report.html\n", + " s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/reports/loan-default-xgboost-2023-02-28-endpoint/monitoring-schedule-2023-02-28-15-53-44-998/2023/02/28/16/report.ipynb\n", + " s3://adp-rnd-ml-stage/xgboost-for-loan-default-data-2023-02-28/reports/loan-default-xgboost-2023-02-28-endpoint/monitoring-schedule-2023-02-28-15-53-44-998/2023/02/28/16/report.pdf\n", + "\n", + "Gender aspects pre-training bias metrics:\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 Gender_FemaleGender_JointGender_MaleGender_Sex Not Availablemetric_kindgoodminmaxdescription
CI0.6311070.4396810.4336990.495513pre_training_bias_metrics0-1.0000001.000000Class Imbalance (CI)
DPL-0.029181-0.0988800.0256910.101349pre_training_bias_metrics0-1.0000001.000000Difference in Positive Proportions in Labels (DPL)
JS0.0005730.0068490.0004270.006377pre_training_bias_metrics00.000000infJensen-Shannon Divergence (JS)
KL0.0023220.0289280.0016910.024665pre_training_bias_metrics00.000000infKullback-Liebler Divergence (KL)
KS0.0291810.0988800.0256910.101349pre_training_bias_metrics00.0000001.000000Kolmogorov-Smirnov Distance (KS)
LP0.0412690.1398380.0363330.143329pre_training_bias_metrics00.000000infL-p Norm (LP)
TVD0.0291810.0988800.0256910.101349pre_training_bias_metrics00.000000infTotal Variation Distance (TVD)
\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Gender aspects post-training bias metrics:\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
 Gender_FemaleGender_JointGender_MaleGender_Sex Not Availablemetric_kindgoodminmaxdescription
AD-0.036073-0.053750-0.0220030.109918post_training_bias_metrics0-1.0000001.000000Accuracy Difference (AD)
DAR-0.054765-0.046989-0.0283260.123350post_training_bias_metrics0-1.0000001.000000Difference in Acceptance Rates (DAR)
DCA-0.079445-0.009282-0.0533510.125789post_training_bias_metrics0-infinfDifference in Conditional Acceptance (DCA)
DCR-0.8688902.435185-1.0647911.119565post_training_bias_metrics0-infinfDifference in Conditional Rejection (DCR)
DI0.9487941.1256800.9075141.013672post_training_bias_metrics10.000000infDisparate Impact (DI)
DPPL0.045510-0.1068800.083611-0.011995post_training_bias_metrics0-1.0000001.000000Difference in Positive Proportions in Predictected Labels (DPPL)
DRR-0.0466090.0277780.034953-0.013975post_training_bias_metrics0-1.0000001.000000Difference in Rejection Rates (DRR)
FT-0.145946-0.017794-0.158451-0.102767post_training_bias_metrics0-1.0000001.000000Flip Test (FT)
GE0.0711140.0711140.0711140.071114post_training_bias_metrics00.0000000.500000Generalized Entropy (GE)
RD0.023645-0.0449270.0260700.004010post_training_bias_metrics0-1.0000001.000000Recall Difference (RD)
SD0.152032-0.2074110.213436-0.149188post_training_bias_metrics0-1.0000001.000000Specificity Difference (SD)
TE0.228039-0.1664530.171154-0.110849post_training_bias_metrics0-infinfTreatment Equality (TE)
\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "analisys_df = None\n", + "\n", + "execution_summary = schedule_desc.get(\"LastMonitoringExecutionSummary\")\n", + "if execution_summary and execution_summary[\"MonitoringExecutionStatus\"] in [\n", + " \"Completed\",\n", + " \"CompletedWithViolations\",\n", + "]:\n", + " last_model_bias_monitor_execution = model_bias_monitor.list_executions()[-1]\n", + " last_model_bias_monitor_execution_report_uri = (\n", + " last_model_bias_monitor_execution.output.destination\n", + " )\n", + " print(f\"Report URI: {last_model_bias_monitor_execution_report_uri}\")\n", + " last_model_bias_monitor_execution_report_files = sorted(\n", + " S3Downloader.list(last_model_bias_monitor_execution_report_uri)\n", + " )\n", + " print(\"Found Report Files:\")\n", + " print(\"\\n \".join(last_model_bias_monitor_execution_report_files))\n", + " if execution_summary[\"MonitoringExecutionStatus\"] == \"CompletedWithViolations\":\n", + " file = [name for name in last_model_bias_monitor_execution_report_files if 'analysis.json' in name]\n", + " assert len(file) == 1, 'Error: analysis file was not generated'\n", + " file = file[0]\n", + " analysis = json.loads(S3Downloader.read_file(file))\n", + " analysis_df = extract_metrix_from_analysis(analysis)\n", + " print('Gender aspects pre-training bias metrics:')\n", + " display(analysis_df[gender_aspects + descr_colnames].query(\"metric_kind == 'pre_training_bias_metrics'\")\n", + " .style.applymap(lambda v: 'opacity: 20%;' if (v < 0.1) and (v > -0.1) else None, subset=gender_aspects))\n", + " print('Gender aspects post-training bias metrics:')\n", + " idx = pd.IndexSlice\n", + " display(analysis_df[gender_aspects + descr_colnames].query(\"metric_kind == 'post_training_bias_metrics'\")\n", + " .style.applymap(lambda v: 'opacity: 20%;' if (v < 0.1) and (v > -0.1) else None, subset=gender_aspects)\n", + " .applymap(lambda v: 'opacity: 20%;' if (v < 1.1) and (v > -0.9) else None, subset=idx[idx['DI'], idx[gender_aspects]]))\n", + " \n", + "else:\n", + " last_model_bias_monitor_execution = None\n", + " print(\n", + " \"====STOP==== \\n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures.\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "9e78d581-bc25-4767-9af3-771f0e765b5d", + "metadata": {}, + "source": [ + "We can see some abnormal values (without opacity). Especially in **Difference in Conditional Rejection (DCR)** metrics. That means that model tends to give negative prediction differently across genders, i.e., we can see bias and need to handle that. At the same time, pre-training metrics, overall, looks normal." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "8d1bd888-f68e-4c12-af36-9d1686a3a366", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Let's print the first few of violations:\n", + "{\n", + " \"version\": \"1.0\",\n", + " \"violations\": [\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"DPL\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value -0.029181259499107992 doesn't meet the baseline constraint requirement 0.004437718747468233\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"JS\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.0005725270156258485 doesn't meet the baseline constraint requirement 1.3179244474955353e-05\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"KL\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.0023215021891724794 doesn't meet the baseline constraint requirement 5.261255142023462e-05\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"KS\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.029181259499107992 doesn't meet the baseline constraint requirement 0.004437718747468261\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"LP\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.041268532950767156 doesn't meet the baseline constraint requirement 0.006275882038666939\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"TVD\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.029181259499107937 doesn't meet the baseline constraint requirement 0.004437718747468247\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"AD\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value -0.036073481794753226 doesn't meet the baseline constraint requirement -0.012274862880233273\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"DAR\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value -0.05476492787359355 doesn't meet the baseline constraint requirement -0.019687953899706012\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"DCA\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value -0.07944485592353545 doesn't meet the baseline constraint requirement -0.035234696157777856\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"DCR\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value -0.8688897309586965 doesn't meet the baseline constraint requirement -0.5736707206241256\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"DPPL\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.045509812991475496 doesn't meet the baseline constraint requirement 0.040580765079404446\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"DRR\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value -0.04660856384994316 doesn't meet the baseline constraint requirement -0.010994514578744008\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"FT\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value -0.14594594594594595 doesn't meet the baseline constraint requirement -0.11717435993272285\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"GE\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.07111418898122515 doesn't meet the baseline constraint requirement 0.06776540220552933\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"RD\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.023645182352392546 doesn't meet the baseline constraint requirement 0.01614804633418354\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"SD\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.15203216692023475 doesn't meet the baseline constraint requirement 0.10832164460419041\"\n", + " },\n", + " {\n", + " \"facet\": \"Gender_Female\",\n", + " \"facet_value\": \"1\",\n", + " \"metric_name\": \"TE\",\n", + " \"constraint_check_type\": \"bias_drift_check\",\n", + " \"description\": \"Metric value 0.228039041703638 doesn't meet the baseline constraint requirement 0.11461989439613904\"\n", + " }\n", + " ]\n", + "}\n" + ] + } + ], + "source": [ + "if last_model_bias_monitor_execution:\n", + " model_bias_violations = last_model_bias_monitor_execution.constraint_violations()\n", + " if model_bias_violations:\n", + " print(\"Let's print the first few of violations:\")\n", + " violations_part = copy.deepcopy(model_bias_violations.body_dict)\n", + " violations_part['violations'] = [violation for violation in violations_part['violations'] if violation['facet'] == 'Gender_Female']\n", + " print(json.dumps(violations_part, indent=4))" + ] + }, + { + "cell_type": "markdown", + "id": "7074f498-bb0a-408f-8c8d-469deb0dbd58", + "metadata": {}, + "source": [ + "After some time, we'll see the monitoring job results in the Monitoring Job History tab\n", + "\n", + "\"monitoring_result\"\n", + "\n", + "Clicking on them we can see the additional details: which metrics constrains were violated compared to the baseline \n", + "\n", + "\"violations\"" + ] + }, + { + "cell_type": "markdown", + "id": "88b9969c-cc15-49af-8e57-c5af9e950775", + "metadata": { + "tags": [] + }, + "source": [ + "## 6. Cleaning up " + ] + }, + { + "cell_type": "markdown", + "id": "cd0bf892-8bf7-431a-a9f1-843f4a9c776c", + "metadata": {}, + "source": [ + "Now we know how to set up model bias monitoring with AWS SageMaker tools. And we can remove all created resourses, in order not to recieve unexpected bills at the end of the month." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "ebf8c362-688e-422c-8c37-ae07c0313231", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Stopping Monitoring Schedule with name: monitoring-schedule-2023-02-28-15-53-44-998\n" + ] + } + ], + "source": [ + "# At first we must wait untill monitoring job finishes, \n", + "# as we can't delete monitoring schedule before\n", + "while model_bias_monitor.describe_schedule()['LastMonitoringExecutionSummary']['MonitoringExecutionStatus'] in ['Pending', 'InProgress']:\n", + " time.sleep(60)\n", + "\n", + "model_bias_monitor.stop_monitoring_schedule()" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "4e373fc3-3fc1-4632-bdbe-00e6dea10d4c", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Deleting Monitoring Schedule with name: monitoring-schedule-2023-02-28-15-53-44-998\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:sagemaker.model_monitor.clarify_model_monitoring:Deleting Model Bias Job Definition with name: model-bias-job-definition-2023-02-28-15-53-44-998\n" + ] + } + ], + "source": [ + "# Turning off traffic generation\n", + "ground_truth_thread.terminate()\n", + "# Deleting monitoring job\n", + "model_bias_monitor.delete_monitoring_schedule()\n", + "# Removing endpoint and model\n", + "sagemaker_client.delete_endpoint(EndpointName=xgb_endpoint_name)\n", + "sagemaker_client.delete_endpoint_config(EndpointConfigName=xgb_endpoint_name)\n", + "sagemaker_client.delete_model(ModelName=xgb_model_name);" + ] + }, + { + "cell_type": "markdown", + "id": "8b9a2e0b-7454-4109-bdec-1dd00fa0a290", + "metadata": {}, + "source": [ + "Don't forget to stop the current runtime (if you work from the SageMaker Studio):\n", + "\n", + "\"runtime\"" + ] + } + ], + "metadata": { + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}