Skip to content

Commit

Permalink
[auto-merge] branch-25.02 to branch-25.04 [skip ci] [bot] (#495)
Browse files Browse the repository at this point in the history
auto-merge triggered by github actions on `branch-25.02` to create a PR
keeping `branch-25.04` up-to-date. If this PR is unable to be merged due
to conflicts, it will remain open until manually fix.
  • Loading branch information
nvauto authored Feb 14, 2025
2 parents 1f2059f + 1bc43fb commit 540b16a
Show file tree
Hide file tree
Showing 9 changed files with 1,811 additions and 61 deletions.
26 changes: 14 additions & 12 deletions examples/ML+DL-Examples/Spark-DL/dl_inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,15 +43,18 @@ Below is a full list of the notebooks with links to the examples they are based

| | Framework | Notebook Name | Description | Link
| ------------- | ------------- | ------------- | ------------- | -------------
| 1 | PyTorch | Image Classification | Training a model to predict clothing categories in FashionMNIST, including accelerated inference with Torch-TensorRT. | [Link](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html)
| 2 | PyTorch | Housing Regression | Training a model to predict housing prices in the California Housing Dataset, including accelerated inference with Torch-TensorRT. | [Link](https://github.com/christianversloot/machine-learning-articles/blob/main/how-to-create-a-neural-network-for-regression-with-pytorch.md)
| 3 | Tensorflow | Image Classification | Training a model to predict hand-written digits in MNIST. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/save_and_load.ipynb)
| 4 | Tensorflow | Keras Preprocessing | Training a model with preprocessing layers to predict likelihood of pet adoption in the PetFinder mini dataset. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/structured_data/preprocessing_layers.ipynb)
| 5 | Tensorflow | Keras Resnet50 | Training ResNet-50 to perform flower recognition from flower images. | [Link](https://docs.databricks.com/en/_extras/notebooks/source/deep-learning/keras-metadata.html)
| 6 | Tensorflow | Text Classification | Training a model to perform sentiment analysis on the IMDB dataset. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/text_classification.ipynb)
| 7+8 | HuggingFace | Conditional Generation | Sentence translation using the T5 text-to-text transformer for both Torch and Tensorflow. | [Link](https://huggingface.co/docs/transformers/model_doc/t5#t5)
| 9+10 | HuggingFace | Pipelines | Sentiment analysis using Huggingface pipelines for both Torch and Tensorflow. | [Link](https://huggingface.co/docs/transformers/quicktour#pipeline-usage)
| 11 | HuggingFace | Sentence Transformers | Sentence embeddings using SentenceTransformers in Torch. | [Link](https://huggingface.co/sentence-transformers)
| 1 | HuggingFace | DeepSeek-R1 | LLM batch inference using the DeepSeek-R1-Distill-Llama reasoning model. | [Link](https://huggingface.co/deepseek-ai/DeepSeek-R1)
| 2 | HuggingFace | Gemma-7b | LLM batch inference using the lightweight Google Gemma-7b model. | [Link](https://huggingface.co/google/gemma-7b-it)
| 3 | HuggingFace | Sentence Transformers | Sentence embeddings using SentenceTransformers in Torch. | [Link](https://huggingface.co/sentence-transformers)
| 4+5 | HuggingFace | Conditional Generation | Sentence translation using the T5 text-to-text transformer for both Torch and Tensorflow. | [Link](https://huggingface.co/docs/transformers/model_doc/t5#t5)
| 6+7 | HuggingFace | Pipelines | Sentiment analysis using Huggingface pipelines for both Torch and Tensorflow. | [Link](https://huggingface.co/docs/transformers/quicktour#pipeline-usage)
| 8 | PyTorch | Image Classification | Training a model to predict clothing categories in FashionMNIST, and deploying with Torch-TensorRT accelerated inference. | [Link](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html)
| 9 | PyTorch | Housing Regression | Training and deploying a model to predict housing prices in the California Housing Dataset, and deploying with Torch-TensorRT accelerated inference. | [Link](https://github.com/christianversloot/machine-learning-articles/blob/main/how-to-create-a-neural-network-for-regression-with-pytorch.md)
| 10 | Tensorflow | Image Classification | Training and deploying a model to predict hand-written digits in MNIST. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/save_and_load.ipynb)
| 11 | Tensorflow | Keras Preprocessing | Training and deploying a model with preprocessing layers to predict likelihood of pet adoption in the PetFinder mini dataset. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/structured_data/preprocessing_layers.ipynb)
| 12 | Tensorflow | Keras Resnet50 | Deploying ResNet-50 to perform flower recognition from flower images. | [Link](https://docs.databricks.com/en/_extras/notebooks/source/deep-learning/keras-metadata.html)
| 13 | Tensorflow | Text Classification | Training and deploying a model to perform sentiment analysis on the IMDB dataset. | [Link](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/text_classification.ipynb)


## Running Locally

Expand Down Expand Up @@ -130,9 +133,8 @@ The notebooks use [PyTriton](https://github.com/triton-inference-server/pytriton
The diagram above shows how Spark distributes inference tasks to run on the Triton Inference Server, with PyTriton handling request/response communication with the server.

The process looks like this:
- Distribute a PyTriton task across the Spark cluster, instructing each worker to launch a Triton server process.
- Use stage-level scheduling to ensure there is a 1:1 mapping between worker nodes and servers.
- Define a Triton inference function, which contains a client that binds to the local server on a given worker and sends inference requests.
- Prior to inference, launch a Triton server process on each node.
- Define a Triton predict function, which creates a client that binds to the local server and sends/receives inference requests.
- Wrap the Triton inference function in a predict_batch_udf to launch parallel inference requests using Spark.
- Finally, distribute a shutdown signal to terminate the Triton server processes on each worker.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,22 +34,25 @@
databricks workspace import $INIT_DEST --format AUTO --file $INIT_SRC
```

6. Launch the cluster with the provided script (note that the script specifies **Azure instances** by default; change as needed):
6. Launch the cluster with the provided script. By default the script will create a cluster with 4 A10 worker nodes and 1 A10 driver node. (Note that the script uses **Azure instances** by default; change as needed).
```shell
cd setup
chmod +x start_cluster.sh
./start_cluster.sh
```

OR, start the cluster from the Databricks UI:

- Go to `Compute > Create compute` and set the desired cluster settings.
- Integration with Triton inference server uses stage-level scheduling (Spark>=3.4.0). Make sure to:
- use a cluster with GPU resources
- use a cluster with GPU resources (for LLM examples, make sure the selected GPUs have sufficient RAM)
- set a value for `spark.executor.cores`
- ensure that `spark.executor.resource.gpu.amount` = 1
- Under `Advanced Options > Init Scripts`, upload the init script from your workspace.
- Under environment variables, set `FRAMEWORK=torch` or `FRAMEWORK=tf` based on the notebook used.
- For Tensorflow notebooks, we recommend setting the environment variable `TF_GPU_ALLOCATOR=cuda_malloc_async` (especially for Huggingface LLM models), which enables the CUDA driver to implicity release unused memory from the pool.
- Under environment variables, set:
- `FRAMEWORK=torch` or `FRAMEWORK=tf` based on the notebook used.
- `HF_HOME=/dbfs/FileStore/hf_home` to cache Huggingface models in DBFS.
- `TF_GPU_ALLOCATOR=cuda_malloc_async` to implicity release unused GPU memory in Tensorflow notebooks.



7. Navigate to the notebook in your workspace and attach it to the cluster. The default cluster name is `spark-dl-inference-$FRAMEWORK`.
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,22 @@ if [[ -z ${FRAMEWORK} ]]; then
exit 1
fi

# Modify the node_type_id and driver_node_type_id below if you don't have this specific instance type.
# Modify executor.cores=(cores per node) and task.resource.gpu.amount=(1/executor cores) accordingly.
# We recommend selecting A10/L4+ instances for these examples.
json_config=$(cat <<EOF
{
"cluster_name": "spark-dl-inference-${FRAMEWORK}",
"spark_version": "15.4.x-gpu-ml-scala2.12",
"spark_conf": {
"spark.executor.resource.gpu.amount": "1",
"spark.python.worker.reuse": "true",
"spark.task.resource.gpu.amount": "0.125",
"spark.sql.execution.arrow.pyspark.enabled": "true",
"spark.executor.cores": "8"
"spark.task.resource.gpu.amount": "0.16667",
"spark.executor.cores": "6"
},
"node_type_id": "Standard_NC8as_T4_v3",
"driver_node_type_id": "Standard_NC8as_T4_v3",
"node_type_id": "Standard_NV12ads_A10_v5",
"driver_node_type_id": "Standard_NV12ads_A10_v5",
"spark_env_vars": {
"TF_GPU_ALLOCATOR": "cuda_malloc_async",
"FRAMEWORK": "${FRAMEWORK}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,12 @@
```shell
export FRAMEWORK=torch
```
Run the cluster startup script. The script will also retrieve and use the [spark-rapids initialization script](https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/spark-rapids/spark-rapids.sh) to setup GPU resources.
Run the cluster startup script. The script will also retrieve and use the [spark-rapids initialization script](https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/spark-rapids/spark-rapids.sh) to setup GPU resources. The script will create 4 L4 worker nodes and 1 L4 driver node by default, named `${USER}-spark-dl-inference-${FRAMEWORK}`.
```shell
cd setup
chmod +x start_cluster.sh
./start_cluster.sh
```
By default, the script creates a 4 node GPU cluster named `${USER}-spark-dl-inference-${FRAMEWORK}`.
7. Browse to the Jupyter web UI:
- Go to `Dataproc` > `Clusters` > `(Cluster Name)` > `Web Interfaces` > `Jupyter/Lab`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,29 +77,29 @@ else
exit 1
fi

# start cluster if not already running
if gcloud dataproc clusters list | grep -q "${cluster_name}"; then
echo "Cluster ${cluster_name} already exists."
else
gcloud dataproc clusters create ${cluster_name} \
--image-version=2.2-ubuntu \
--region ${COMPUTE_REGION} \
--master-machine-type n1-standard-16 \
--num-workers 4 \
--worker-min-cpu-platform="Intel Skylake" \
--worker-machine-type n1-standard-16 \
--master-accelerator type=nvidia-tesla-t4,count=1 \
--worker-accelerator type=nvidia-tesla-t4,count=1 \
--initialization-actions gs://${SPARK_DL_HOME}/init/spark-rapids.sh,${INIT_PATH} \
--metadata gpu-driver-provider="NVIDIA" \
--metadata gcs-bucket=${GCS_BUCKET} \
--metadata spark-dl-home=${SPARK_DL_HOME} \
--metadata requirements="${requirements}" \
--worker-local-ssd-interface=NVME \
--optional-components=JUPYTER \
--bucket ${GCS_BUCKET} \
--enable-component-gateway \
--max-idle "60m" \
--subnet=default \
--no-shielded-secure-boot
exit 0
fi

CLUSTER_PARAMS=(
--image-version=2.2-ubuntu
--region ${COMPUTE_REGION}
--num-workers 4
--master-machine-type g2-standard-8
--worker-machine-type g2-standard-8
--initialization-actions gs://${SPARK_DL_HOME}/init/spark-rapids.sh,${INIT_PATH}
--metadata gpu-driver-provider="NVIDIA"
--metadata gcs-bucket=${GCS_BUCKET}
--metadata spark-dl-home=${SPARK_DL_HOME}
--metadata requirements="${requirements}"
--worker-local-ssd-interface=NVME
--optional-components=JUPYTER
--bucket ${GCS_BUCKET}
--enable-component-gateway
--max-idle "60m"
--subnet=default
--no-shielded-secure-boot
)

gcloud dataproc clusters create ${cluster_name} "${CLUSTER_PARAMS[@]}"
Loading

0 comments on commit 540b16a

Please sign in to comment.