diff --git a/docs/tutorials/kserve-basic.md b/docs/tutorials/kserve-basic.md index 51c2c79..545a137 100644 --- a/docs/tutorials/kserve-basic.md +++ b/docs/tutorials/kserve-basic.md @@ -39,7 +39,7 @@ export FUSEML_SERVER_URL=http://$(kubectl get VirtualService -n fuseml-core fuse ## 3. Fetch the FuseML examples code ```bash -git clone --depth 1 -b release-0.3 https://github.com/fuseml/examples.git +git clone --depth 1 -b main https://github.com/fuseml/examples.git cd examples ``` diff --git a/docs/tutorials/seldon-core.md b/docs/tutorials/seldon-core.md index 6ddde34..d5db1df 100644 --- a/docs/tutorials/seldon-core.md +++ b/docs/tutorials/seldon-core.md @@ -39,7 +39,7 @@ export FUSEML_SERVER_URL=http://$(kubectl get VirtualService -n fuseml-core fuse ## 3. Fetch the FuseML examples code ```bash -git clone --depth 1 -b release-0.3 https://github.com/fuseml/examples.git +git clone --depth 1 -b main https://github.com/fuseml/examples.git cd examples ``` diff --git a/docs/workflows/kserve-predictor.md b/docs/workflows/kserve-predictor.md index 01336b8..065c82b 100644 --- a/docs/workflows/kserve-predictor.md +++ b/docs/workflows/kserve-predictor.md @@ -13,12 +13,125 @@ The KServe predictor step expects a model URL to be supplied as input, pointing The predictor performs the following tasks: - downloads the model locally from the MLflow artifact store -- if so instructed, it auto-detects the model format based on the information stored in the MLflow artifact store and decides which KServe predictor engine to use for it. Otherwise, it validates the model format against the type of predictor engine specified as input. +- if so instructed (i.e. the `predictor` input parameter is omitted or explicitly set to `auto`), it auto-detects the model format based on the information stored in the MLflow artifact store and decides which KServe predictor engine to use for it. Otherwise, it validates the model format against the type of predictor engine specified as input. - it performs some minor conversion tasks required to adapt the input MLflow model directory layout to the one required by KServe +- it uploads the converted model to the same artifact store as the original model, in a different location (the converted model is stored in a subdirectory of the original model's location) - it creates a KServe prediction service to serve the model -- finally, it registers the KServe prediction service with FuseML as an Application object. Information about the Application, such as the type and exposed inference URL can be retrieved at any time using the FuseML API and CLI. +- finally, it registers the KServe prediction service with FuseML as an Application object. Information about the Application, such as the type and exposed inference URL can be retrieved [through the FuseML CLI](../cli.md#applications) or [through the REST API](../api.md). +The KServe predictor has a single output: the URL where the prediction service can be accessed to process inference requests. + +The Dockerfile and associated scripts that implement the KServe predictor container image are available in the [FuseML extensions repository](https://github.com/fuseml/extensions/tree/main/images/inference-services/kserve). + +The KServe predictor is featured in a number of FuseML tutorials, such as: + +- [Logistic Regression with MLFlow & KServe](../tutorials/kserve-basic.md) +- [Training & Serving ML Models on GPU with NVIDIA Triton](../tutorials/kserve-triton-gpu.md) +- [Benchmarking ML Models on Intel CPUs with Intel OpenVINO](../tutorials/openvino-mlflow.md) ## Using the KServe Predictor Step -TBD +The recommended way to use the KServe predictor step in a FuseML workflow is to have an MLflow trainer step part of the same workflow and to reference its output model as input to the KServe predictor, as shown in the example below. + +```yaml +name: mlflow-e2e +description: | + End-to-end pipeline template that takes in an MLFlow compatible codeset, + runs the MLFlow project to train a model, then creates a KServe prediction + service that can be used to run predictions against the model. +inputs: + - name: mlflow-codeset + description: an MLFlow compatible codeset + type: codeset + - name: predictor + description: type of predictor engine + type: string + default: auto +outputs: + - name: prediction-url + description: "The URL where the exposed prediction service endpoint can be contacted to run predictions." + type: string +steps: + - name: builder + image: ghcr.io/fuseml/mlflow-builder:latest + inputs: + - name: mlflow-codeset + codeset: + name: "{{ inputs.mlflow-codeset }}" + path: /project + outputs: + - name: image + - name: trainer + image: "{{ steps.builder.outputs.image }}" + inputs: + - name: mlflow-codeset + codeset: + name: "{{ inputs.mlflow-codeset }}" + path: "/project" + outputs: + - name: mlflow-model-url + extensions: + - name: mlflow-tracking + product: mlflow + service_resource: mlflow-tracking + - name: mlflow-store + product: mlflow + service_resource: s3 + - name: predictor + image: ghcr.io/fuseml/kserve-predictor:latest + inputs: + - name: model + value: "{{ steps.trainer.outputs.mlflow-model-url }}" + - name: predictor + value: "{{ inputs.predictor }}" + - name: app_name + value: "{{ inputs.mlflow-codeset.name }}-{{ inputs.mlflow-codeset.project }}" + outputs: + - name: prediction-url + extensions: + - name: mlflow-s3-store + product: mlflow + service_resource: s3 + - name: kserve + service_resource: kserve-api +``` + +Aside from the mandatory input `model` parameter that needs to be a URL pointing to the location of a trained ML model saved in an MLflow artifact store, the KServe predictor workflow step also accepts the following optional input parameters that can be used to customize how the prediction service is created and updated: + +- `predictor`: this can be used to configure the type of KServe predictor engine used for the prediction service. This can take the following values: + + - `auto` (default): when this value is used, the KServe predictor will automatically detect the type of model from the MLflow metadata present in the artifact store and use the appropriate predictor engine: TensorFlow Serving for TensorFlow models, the scikit-learn predictor for scikit-learn pickled models and the Triton back-end for Keras and ONNX models + - `tensorflow`: use to serve models with the TensorFlow Serving engine. Only works with models trained with TensorFlow or Keras and saved using the TensorFlow saved_model model format. + - `sklearn`: use to serve models trained with scikit-learn and saved in the sklearn pickled model format + - `triton`: use the NVidia Triton prediction back-end. Works with models trained with TensorFlow or Keras and saved using the TensorFlow saved_model format and with models in ONNX format. + +- `app_name`: use this to explicitly set the name of the FuseML application used to represent the KServe prediction service. Its value also determines the prediction URL as well as the names of the Kubernetes resources created by the KServe predictor. Our example uses an expression to dynamically set the `app_name` parameter to the name and project of the MLflow codeset used as workflow input. If not set, the application name is constructed by combining the workflow name with the name of an input codeset name and project, if one is provided as input. In the absence of an input codeset, the application name is generated by concatenating the workflow name with a randomly generated string. + + !!! note + + Choosing a value for the `app_name` parameter should be done with care, as it is used to uniquely identify a FuseML application and its associated Kubernetes resources (i.e. the name of the KServe prediction service, prediction URL etc.). It can lead to a situation where the same KServe prediction service is managed by more than one FuseML workflows. In this case, the results can be unpredictable, because multiple workflows will compete over managing the same application. + + !!! warning + + If an `app_name` value is not provided and the predictor step doesn't receive an input codeset, the generated application name will be random, which means that every workflow run will create a new application and prediction service. This should be avoided, as it easily lead to resource exhaustion. + +- `runtime_version` - use to explicitly set the version of the KServe runtime (i.e. predictor container image) for the prediction service. If not set, the runtime version is automatically determined from the information available in the MLflow model store for some model formats (e.g. the TensorFlow Serving runtime version is set to match the tensorflow library version). +- `resources_limits` - use to set the Kubernetes resource limits to allocate hardware resources to the prediction service. E.g: + + ```yaml + - name: resources_limits + value: '{nvidia.com/gpu: 1}' + ``` + +- `verbose` - set to `true` to enable verbose logging in the predictor workflow step (default is `false`). + +The KServe runtime workflow step can also take in some environment variables that are used to configure the credentials of the remote MLflow artifact store where the input ML model is stored: + +!!! note + + Some of these environment variables contain sensitive data, such as keys and passwords and should not be explicitly configured as workflow step env vars. Instead, they should be registered in the [FuseML Extension Registry](../extensions/extension-registry.md) and only referenced in the FuseML workflows as [extension requirements](../extensions/extension-registry.md#referencing-extensions-in-workflows). + +- `MLFLOW_S3_ENDPOINT_URL` - required when the ML model is stored in a custom S3 compatible artifact store such as minio +- `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` - credentials for the AWS S3 and S3-compatible artifact store + +Observe how the `mlflow-s3-store` extension requirement is used in the `predictor` step to reference an MLflow artifact store backend registered in the [FuseML Extension Registry](../extensions/extension-registry.md). This avoids having to configure credentials and other environment variables explicitly in the FuseML workflow. The FuseML workflow engine automatically resolves these references to matching records available in the FuseML Extension Registry and passes the configuration entries in the extension records as environment variables to the workflow step container (i.e. variables like `MLFLOW_S3_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`). diff --git a/docs/workflows/mlflow-builder.md b/docs/workflows/mlflow-builder.md index b98451d..13e36d1 100644 --- a/docs/workflows/mlflow-builder.md +++ b/docs/workflows/mlflow-builder.md @@ -35,6 +35,15 @@ The MLflow builder workflow step leverages the MLflow Project conventions to aut The MLflow builder has a single output: the container registry repository and the image tag where the built MLflow environment container image is stored. This output can be used in subsequent workflow steps to run the MLflow code from the same codeset as the one used as input. The most common use for the resulted container image is executing code that trains and validates ML models. For this reason, the output container image is often referred to as a "trainer" workflow step. +The Dockerfile and associated scripts that implement the MLflow builder container image are available in the [FuseML extensions repository](https://github.com/fuseml/extensions/tree/main/images/builders/mlflow). + +The MLflow builder is featured in a number of FuseML tutorials, such as: + +- [Logistic Regression with MLFlow & KServe](../tutorials/kserve-basic.md) +- [Logistic Regression with MLFlow & Seldon-Core](../tutorials/seldon-core.md) +- [Training & Serving ML Models on GPU with NVIDIA Triton](../tutorials/kserve-triton-gpu.md) +- [Benchmarking ML Models on Intel CPUs with Intel OpenVINO](../tutorials/openvino-mlflow.md) + ## Using the MLflow Builder Step Here is an example of a FuseML workflow that builds an MLflow runtime environment container image out of an MLflow compatible codeset and returns the location where it's stored in the internal FuseML container registry: @@ -91,7 +100,7 @@ The MLflow runtime workflow step can also take in additional environment variabl !!! note - Some of these environment variables contain sensitive data, such as keys and passwords and should not be explicitly configured as workflow step env vars. Instead, they should be configured in the FuseML Extension Registry and only referenced in the FuseML workflows as extension requirements. + Some of these environment variables contain sensitive data, such as keys and passwords and should not be explicitly configured as workflow step env vars. Instead, they should be registered in the [FuseML Extension Registry](../extensions/extension-registry.md) and only referenced in the FuseML workflows as [extension requirements](../extensions/extension-registry.md#referencing-extensions-in-workflows). - `MLFLOW_TRACKING_URI` - the URL of a remote MLflow tracking server to use. - `MLFLOW_TRACKING_USERNAME` and `MLFLOW_TRACKING_PASSWORD` - username and password to use with HTTP Basic authentication to authenticate with the remote MLflow tracking server. @@ -144,5 +153,5 @@ steps: Note how the `builder` step output is referenced as the image value for the `trainer` step and how both steps use the same `mlflow-codeset` codeset as input. The builder workflow step creates the MLflow environment container image and the trainer step uses it to execute the MLflow code and train the ML model. -Also observe how the `mlflow-tracking` and `mlflow-store` extensions are used in the `trainer` step to reference an MLflow tracking server and an artifact store backend configured in the FuseML Extension Registry. This avoids having to configure credentials and other environment variables explicitly in the FuseML workflow. The FuseML workflow engine automatically resolves these references to matching records available in the FuseML Extension Registry and passes the configuration entries in the extension records as environment variables to the workflow step container (i.e. variables like `MLFLOW_TRACKING_URI` , `MLFLOW_S3_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`). +Also observe how the `mlflow-tracking` and `mlflow-store` extensions are used in the `trainer` step to reference an MLflow tracking server and an artifact store backend configured in the [FuseML Extension Registry](../extensions/extension-registry.md). This avoids having to configure credentials and other environment variables explicitly in the FuseML workflow. The FuseML workflow engine automatically resolves these references to matching records available in the FuseML Extension Registry and passes the configuration entries in the extension records as environment variables to the workflow step container (i.e. variables like `MLFLOW_TRACKING_URI` , `MLFLOW_S3_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`). diff --git a/docs/workflows/ovms-converter.md b/docs/workflows/ovms-converter.md index 5b9b71d..8df89e2 100644 --- a/docs/workflows/ovms-converter.md +++ b/docs/workflows/ovms-converter.md @@ -20,9 +20,72 @@ The converter performs the following tasks: - downloads the model locally from the remote artifact store. - if the model is stored in an MLflow remote store and if so instructed, it auto-detects the format of the input model based on the information stored in the MLflow artifact store. - it converts/optimizes the model using the provided OpenVINO Model Optimizer tools. -- it uploads the converted model to the output remote artifact store. +- it uploads the converted model to an artifact store. It can be the same artifact store as the original model, or a different one. + +The OVMS converter has a single output: the URL where the converted ML model is stored. + +The Dockerfile and associated scripts that implement the OVMS converter container image are available in the [FuseML extensions repository](https://github.com/fuseml/extensions/tree/main/images/converters/ovms). + +The OVMS converter is featured in a number of FuseML tutorials, such as: + +- [Benchmarking ML Models on Intel CPUs with Intel OpenVINO](../tutorials/openvino-mlflow.md) +- [FuseML Extension Development Use-Case - OpenVINO](../tutorials/openvino-extensions.md) ## Using the OVMS Converter Step -TBD +The following is a step in a FuseML workflow that is used to convert a model stored in an MLFlow artifact store to the IR format supported by the OpenVINO Model Server. + +```yaml +steps: + [...] + - name: converter + image: ghcr.io/fuseml/ovms-converter:latest + inputs: + - name: input_model + value: '{{ steps.trainer.outputs.mlflow-model-url }}' + - name: output_model + value: '{{ steps.trainer.outputs.mlflow-model-url }}/ovms' + - name: input_format + value: '{{ inputs.model-format }}' + - name: batch + value: 1 # OpenVINO cannot work with undefined input dimensions + - name: extra_args + # Disabling the implicit transpose transformation allows the input model shape + # to be consistent with those used by other serving platforms + value: "--disable_nhwc_to_nchw" + outputs: + - name: ovms-model-url + extensions: + - name: mlflow-store + product: mlflow + service_resource: s3 + env: + - name: S3_ENDPOINT + value: '{{ extensions.mlflow-store.cfg.MLFLOW_S3_ENDPOINT_URL }}' + [...] +``` + +Aside from the mandatory input `input_model` parameter that needs to be a URL pointing to the location of a trained ML model saved in a remote artifact store or object storage service, the OVMS converter workflow step also accepts the following optional input parameters that can be used to customize how the input ML model is converted and stored: + +- `input_format`: specifies the format for the input ML model. This can take the following values: + + - if set to `auto` (default), the OVMS converter expects the model to be stored in an MLflow artifact store. It will attempt to automatically detect the model format from the MLflow metadata present in the artifact store. + - `tensorflow.saved_model`: TensorFlow saved_model model format. + - `onnx`: ONNX model format + +- `input_shape`, `scale`, `reverse_input_channels`, `log_level`, `input`, `output`, `mean_values`, `scale_values`, `data_type`, `batch` and `static_shape` correspond to [OpenVINO Model Optimizer generic conversion parameters](https://docs.openvino.ai/latest/openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model.html#general-conversion-parameters) that can be configured to customize the conversion process. +- `extra_args`: additional command line arguments that are passed to the OpenVINO Model Optimizer utility. + +The OVMS converter workflow step can also take in some environment variables that are used to configure the credentials of the remote artifact store(s) where the input and output ML models are stored: + +!!! note + + Some of these environment variables contain sensitive data, such as keys and passwords and should not be explicitly configured as workflow step env vars. Instead, they should be registered in the [FuseML Extension Registry](../extensions/extension-registry.md) and only referenced in the FuseML workflows as [extension requirements](../extensions/extension-registry.md#referencing-extensions-in-workflows). + +- `S3_ENDPOINT` - required when the ML model is stored in a custom S3 compatible artifact store such as minio +- `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` - credentials for the AWS S3 and S3-compatible artifact store +- `OUTPUT_S3_ENDPOINT` - required when the output ML model must be uploaded in an S3 artifact store that is different from the input artifact store +- `OUTPUT_AWS_ACCESS_KEY_ID` and `OUTPUT_AWS_SECRET_ACCESS_KEY` - credentials for the S3 output artifact store, when different than the input artifact store + +Observe how the `s3-storage` extension is used in the `converter` step to reference an MLflow artifact store backend registered in the [FuseML Extension Registry](../extensions/extension-registry.md). This avoids having to configure credentials and other environment variables explicitly in the FuseML workflow. The FuseML workflow engine automatically resolves these references to matching records available in the FuseML Extension Registry and passes the configuration entries in the extension records as environment variables to the workflow step container (i.e. variables like `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`). Other environment variables need to be explicitly mapped to the workflow step container using the `env` parameter (i.e. `MLFLOW_S3_ENDPOINT_URL` is mapped to `S3_ENDPOINT`). diff --git a/docs/workflows/ovms-predictor.md b/docs/workflows/ovms-predictor.md index 998af7d..96c245c 100644 --- a/docs/workflows/ovms-predictor.md +++ b/docs/workflows/ovms-predictor.md @@ -4,18 +4,98 @@ The OVMS predictor workflow step can be used to create and manage [OpenVINO Model Server inference servers](https://docs.openvino.ai/latest/openvino_docs_ovms.html) to serve input ML models as part of the execution of FuseML workflows. The OVMS predictor only accepts models in IR (Intermediate Representation) format as input. The [OVMS converter](ovms-converter.md) workflow extension can be used to convert models to IR format. -The KServe predictor step expects a model URL to be supplied as input, pointing to the location in a remote model store where the model is stored. The protocols supported for the remote artifact store are the same as [those supported by the OVMS implementation](https://github.com/openvinotoolkit/model_server/tree/v2021.3/deploy#model-repository): +The OVMS predictor step expects a model URL to be supplied as input, pointing to the location in a remote artifact store where the model is stored. The protocols supported for the remote artifact store are the same as [those supported by the OVMS implementation](https://github.com/openvinotoolkit/model_server/tree/v2021.3/deploy#model-repository): - AWS S3 or S3 compatible -- GCS -- Azure Blob Storage +- GCS (authentication credentials not supported) +- Azure Blob Storage (authentication credentials not supported) The predictor performs the following tasks: - it creates an OVMS prediction service instance to serve the model, and an Istio virtualservice that exposes the prediction service - it registers the OVMS prediction service with FuseML as an Application object. Information about the Application, such as the type and exposed inference URL can be retrieved at any time using the FuseML API and CLI. +The OVMS predictor has a single output: the URL where the prediction service can be accessed to process inference requests. + +The Dockerfile and associated scripts that implement the OVMS predictor container image are available in the [FuseML extensions repository](https://github.com/fuseml/extensions/tree/main/images/inference-services/ovms). + +The OVMS predictor is featured in a number of FuseML tutorials, such as: + +- [Benchmarking ML Models on Intel CPUs with Intel OpenVINO](../tutorials/openvino-mlflow.md) +- [FuseML Extension Development Use-Case - OpenVINO](../tutorials/openvino-extensions.md) + ## Using the OVMS Predictor Step -TBD +The workflow example below shows how to create a FuseML workflow that uses the OVMS predictor step to serve a model from a remote artifact store. The model is already in IR (Intermediate Representation) format. For models that are not in IR format, the [OVMS converter step](ovms-converter.md) can be used as part of the same workflow to convert the model before it is served with OVMS. + +```yaml +name: ovms-workflow +description: | + Workflow that takes in a URL containing one of more OVMS models in IR format + then uses OVMS to deploy a prediction service that can be used to run predictions + against the model. +inputs: + - name: model + description: URL where an IR model is stored + type: string + default: gs://ovms-public-eu/resnet50-binary +outputs: + - name: prediction-url + description: "The URL where the exposed prediction service endpoint can be contacted to run predictions." + type: string +steps: + - name: ovms-predictor + image: ghcr.io/fuseml/ovms-predictor:latest + inputs: + - name: model + value: '{{ inputs.model }}' + - name: app_name + value: resnet50-ovms + - name: resources + value: '{"requests": {"cpu": 1}}' + - name: verbose + value: false + outputs: + - name: prediction-url + extensions: + - name: ovms-operator + service_resource: ovms-operator +``` + +Aside from the mandatory `model` input parameter that needs to be a URL pointing to the location of a trained ML model in IR format saved in an artifact store, the OVMS predictor workflow step also accepts the following optional input parameters that can be used to customize how the prediction service is created and updated: + +- `model_name`: the name used to identify the model in the OVMS prediction API (default value is: `default`) +- `ovms_image_tag`: the version of the OVMS container image to use (default value is: `2021.4.1`) +- `loglevel`, `nireq`, `plugin_config`, `batch_size`, `shape`, `target_device` and `layout` are parameters that are mapped to the corresponding OVMS deployment parameters. See the [OVMS documentation](https://github.com/openvinotoolkit/model_server/tree/main/deploy#helm-options-references) for a description of these parameters. +- `replicas`: number of replicas for the OVMS prediction service +- `resources`: can be used to configure Kubernetes resource requests and limits for the OVMS prediction service +- `prediction_type`: determines [the type of TensorFlow Serving prediction requests](https://www.tensorflow.org/tfx/serving/api_rest#classify_and_regress_api) that the returned prediction URL is made for: `classify`, `regress` or `predict`. The default value is `predict`. +- `app_name`: use this to explicitly set the name of the FuseML application used to represent the OVMS prediction service. Its value also determines the prediction URL as well as the names of the Kubernetes resources created by the OVMS predictor. If not set explicitly, the application name is constructed by combining the workflow name with the name of an input codeset name and project, if one is provided as input. In the absence of an input codeset, the application name is generated by concatenating the workflow name with a randomly generated string. + + !!! note + + Choosing a value for the `app_name` parameter should be done with care, as it is used to uniquely identify a FuseML application and its associated Kubernetes resources (i.e. the name of the OVMS prediction service, prediction URL etc.). It can lead to a situation where the same OVMS prediction service is managed by more than one FuseML workflows. In this case, the results can be unpredictable, because multiple workflows will compete over managing the same application. + + !!! warning + + If an `app_name` value is not provided and the predictor step doesn't receive an input codeset, the generated application name will be random, which means that every workflow run will create a new application and prediction service. This should be avoided, as it easily lead to resource exhaustion. + +- `resources_limits` - use to set the Kubernetes resource limits to allocate hardware resources to the prediction service. E.g: + + ```yaml + - name: resources_limits + value: '{nvidia.com/gpu: 1}' + ``` + +- `verbose` - set to `true` to enable verbose logging in the predictor workflow step (default is `false`). + +The OVMS runtime workflow step can also take in some environment variables that are used to configure the credentials of the remote artifact store where the input ML model is stored: + +!!! note + + Some of these environment variables contain sensitive data, such as keys and passwords and should not be explicitly configured as workflow step env vars. Instead, they should be registered in the [FuseML Extension Registry](../extensions/extension-registry.md) and only referenced in the FuseML workflows as [extension requirements](../extensions/extension-registry.md#referencing-extensions-in-workflows). + +- `S3_ENDPOINT` - required when the ML model is stored in a custom S3 compatible artifact store such as minio +- `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` - credentials for the AWS S3 and S3-compatible artifact store + diff --git a/docs/workflows/seldon-core-predictor.md b/docs/workflows/seldon-core-predictor.md index 3340d5e..98ef9fb 100644 --- a/docs/workflows/seldon-core-predictor.md +++ b/docs/workflows/seldon-core-predictor.md @@ -15,10 +15,115 @@ The predictor performs the following tasks: - downloads the model locally from the MLflow artifact store - if so instructed, it auto-detects the model format based on the information stored in the MLflow artifact store and decides which type of Seldon Core predictor server to use for it. Otherwise, it validates the model format against the type of predictor server specified as input. - it performs some minor conversion tasks required to adapt the input MLflow model directory layout to the one required by Seldon Core +- it uploads the converted model to the same artifact store as the original model, in a different location (the converted model is stored in a subdirectory of the original model's location) - it creates a Seldon Core prediction service to serve the model -- finally, it registers the Seldon Core prediction service with FuseML as an Application object. Information about the Application, such as the type and exposed inference URL can be retrieved at any time using the FuseML API and CLI. +- finally, it registers the Seldon Core prediction service with FuseML as an Application object. Information about the Application, such as the type and exposed inference URL can be retrieved [through the FuseML CLI](../cli.md#applications) or [through the REST API](../api.md). +The Seldon Core predictor has a single output: the URL where the prediction service can be accessed to process inference requests. + +The Dockerfile and associated scripts that implement the Seldon Core predictor container image are available in the [FuseML extensions repository](https://github.com/fuseml/extensions/tree/main/images/inference-services/seldon-core). + +The Seldon Core predictor is featured in a number of FuseML tutorials, such as: + +- [Logistic Regression with MLFlow & Seldon-Core](../tutorials/seldon-core.md) +- [Benchmarking ML Models on Intel CPUs with Intel OpenVINO](../tutorials/openvino-mlflow.md) ## Using the Seldon Core Predictor Step -TBD + +The recommended way to use the Seldon Core predictor step in a FuseML workflow is to have an MLflow trainer step part of the same workflow and to reference its output model as input to the Seldon Core predictor, as shown in the example below. + +```yaml +name: mlflow-e2e +description: | + End-to-end pipeline template that takes in an MLFlow compatible codeset, + runs the MLFlow project to train a model, then uses Seldon Core to create prediction + service that can be used to run predictions against the model. +inputs: + - name: mlflow-codeset + description: an MLFlow compatible codeset + type: codeset + - name: predictor + description: type of predictor engine + type: string + default: auto +outputs: + - name: prediction-url + description: "The URL where the exposed prediction service endpoint can be contacted to run predictions." + type: string +steps: + - name: builder + image: ghcr.io/fuseml/mlflow-builder:latest + inputs: + - name: mlflow-codeset + codeset: + name: "{{ inputs.mlflow-codeset }}" + path: /project + outputs: + - name: image + - name: trainer + image: "{{ steps.builder.outputs.image }}" + inputs: + - name: mlflow-codeset + codeset: + name: "{{ inputs.mlflow-codeset }}" + path: "/project" + outputs: + - name: mlflow-model-url + extensions: + - name: mlflow-tracking + product: mlflow + service_resource: mlflow-tracking + - name: mlflow-store + product: mlflow + service_resource: s3 + - name: predictor + image: ghcr.io/fuseml/seldon-core-predictor:latest + inputs: + - name: model + value: '{{ steps.trainer.outputs.mlflow-model-url }}' + - name: predictor + value: '{{ inputs.predictor }}' + - name: app_name + value: "{{ inputs.mlflow-codeset.name }}-{{ inputs.mlflow-codeset.project }}" + outputs: + - name: prediction-url + extensions: + - name: mlflow-s3-store + product: mlflow + service_resource: s3 + - name: seldon-core + service_resource: seldon-core-api +``` + +Aside from the mandatory input `model` parameter that needs to be a URL pointing to the location of a trained ML model saved in an MLflow artifact store, the Seldon Core predictor workflow step also accepts the following optional input parameters that can be used to customize how the prediction service is created and updated: + +- `predictor`: this can be used to configure the type of Seldon Core predictor engine used for the prediction service. This can take the following values: + + - `auto`: if this value is used, the Seldon Core predictor will automatically detect the type of model from the MLflow metadata present in the artifact store and use the appropriate predictor engine: TensorFlow Serving for TensorFlow and Keras models and the the scikit-learn predictor for scikit-learn pickled models + - `tensorflow`: use to serve models with the TensorFlow Serving engine. Only works with models trained with TensorFlow or Keras and saved using the TensorFlow saved_model model format. + - `sklearn`: use to serve models trained with scikit-learn and saved in the sklearn pickled model format + - `triton`: use the NVidia Triton prediction back-end. Works with models trained with TensorFlow or Keras and saved using the TensorFlow saved_model format and with models in ONNX format. + +- `app_name`: use this to explicitly set the name of the FuseML application used to represent the Seldon Core prediction service. Its value also determines the prediction URL as well as the names of the Kubernetes resources created by the Seldon Core predictor. Our example uses an expression to dynamically set the `app_name` parameter to the name and project of the MLflow codeset used as workflow input. If not set, the application name is constructed by combining the workflow name with the name of an input codeset name and project, if one is provided as input. In the absence of an input codeset, the application name is generated by concatenating the workflow name with a randomly generated string. + + !!! note + + Choosing a value for the `app_name` parameter should be done with care, as it is used to uniquely identify a FuseML application and its associated Kubernetes resources (i.e. the name of the Seldon Core prediction service, prediction URL etc.). It can lead to a situation where the same Seldon Core prediction service is managed by more than one FuseML workflows. In this case, the results can be unpredictable, because multiple workflows will compete over managing the same application. + + !!! warning + + If an `app_name` value is not provided and the predictor step doesn't receive an input codeset, the generated application name will be random, which means that every workflow run will create a new application and prediction service. This should be avoided, as it easily lead to resource exhaustion. + +- `verbose` - set to `true` to enable verbose logging in the predictor workflow step (default is `false`). + +The Seldon Core runtime workflow step can also take in some environment variables that are used to configure the credentials of the remote MLflow artifact store where the input ML model is stored: + +!!! note + + Some of these environment variables contain sensitive data, such as keys and passwords and should not be explicitly configured as workflow step env vars. Instead, they should be registered in the [FuseML Extension Registry](../extensions/extension-registry.md) and only referenced in the FuseML workflows as [extension requirements](../extensions/extension-registry.md#referencing-extensions-in-workflows). + +- `MLFLOW_S3_ENDPOINT_URL` - required when the ML model is stored in a custom S3 compatible artifact store such as minio +- `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` - credentials for the AWS S3 and S3-compatible artifact store + +Observe how the `mlflow-s3-store` extension requirement is used in the `predictor` step to reference an MLflow artifact store backend registered in the [FuseML Extension Registry](../extensions/extension-registry.md). This avoids having to configure credentials and other environment variables explicitly in the FuseML workflow. The FuseML workflow engine automatically resolves these references to matching records available in the FuseML Extension Registry and passes the configuration entries in the extension records as environment variables to the workflow step container (i.e. variables like `MLFLOW_S3_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`). diff --git a/docs/workflows/workflows.md b/docs/workflows/workflows.md index cce65a7..d86a410 100644 --- a/docs/workflows/workflows.md +++ b/docs/workflows/workflows.md @@ -27,7 +27,7 @@ outputs: type: string steps: - name: builder - image: ghcr.io/fuseml/mlflow-builder:v0.3.0 + image: ghcr.io/fuseml/mlflow-builder:latest inputs: - name: mlflow-codeset codeset: @@ -52,7 +52,7 @@ steps: product: mlflow service_resource: s3 - name: predictor - image: ghcr.io/fuseml/kserve-predictor:v0.3.0 + image: ghcr.io/fuseml/kserve-predictor:latest inputs: - name: model value: "{{ steps.trainer.outputs.mlflow-model-url }}" @@ -110,7 +110,7 @@ The inputs, parameters and outputs declared globally are available to all steps ```yaml steps: - name: builder - image: ghcr.io/fuseml/mlflow-builder:v0.3.0 + image: ghcr.io/fuseml/mlflow-builder:latest inputs: - name: mlflow-codeset codeset: @@ -158,7 +158,7 @@ The second step in the workflow is responsible for executing the ML code in the ```yaml - name: predictor - image: ghcr.io/fuseml/kserve-predictor:v0.3.0 + image: ghcr.io/fuseml/kserve-predictor:latest inputs: - name: model value: "{{ steps.trainer.outputs.mlflow-model-url }}"