Skip to content

Commit

Permalink
Update urllib3 requirement and cleanup instructions (#503)
Browse files Browse the repository at this point in the history
- Pin of urllib3<2 was an artifact from the old Databricks runtime we
were using and is no longer needed. Unpinning allows the latest pytriton
to be used.
- Install libstdcxx by default since latest PyTriton requires 3.4.30
- While testing on CSPs, cleaned up/fixed some instructions.

---------

Signed-off-by: Rishi Chandra <rishic@nvidia.com>
  • Loading branch information
rishic3 authored Feb 27, 2025
1 parent 3264e1c commit 14e9bbc
Show file tree
Hide file tree
Showing 7 changed files with 19 additions and 20 deletions.
8 changes: 3 additions & 5 deletions examples/ML+DL-Examples/Spark-DL/dl_inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,19 +69,21 @@ Each notebook has a suffix `_torch` or `_tf` specifying the environment used.
```
conda create -n spark-dl-torch -c conda-forge python=3.11
conda activate spark-dl-torch
conda install -c conda-forge libstdcxx-ng
pip install -r torch_requirements.txt
```
**For TensorFlow:**
```
conda create -n spark-dl-tf -c conda-forge python=3.11
conda activate spark-dl-tf
conda install -c conda-forge libstdcxx-ng
pip install -r tf_requirements.txt
```

#### Start Cluster

For demonstration, these instructions just use a local Standalone cluster with a single executor, but they can be run on any distributed Spark cluster. For cloud environments, see [below](#running-on-cloud-environments).

If you haven't already, [install Spark](https://spark.apache.org/downloads.html) on your system.
```shell
# Replace with your Spark installation path
export SPARK_HOME=</path/to/spark>
Expand Down Expand Up @@ -114,10 +116,6 @@ If you encounter issues starting the Triton server, you may need to link your li
```shell
ln -sf /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ${CONDA_PREFIX}/lib/libstdc++.so.6
```
If the issue persists with the message `libstdc++.so.6: version 'GLIBCXX_3.4.30' not found`, you may need to update libstdc++ in your conda environment:
```shell
conda install -c conda-forge libstdcxx-ng
```

## Running on Cloud Environments

Expand Down
20 changes: 12 additions & 8 deletions examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Spark DL Inference on Databricks

**Note**: fields in \<brackets\> require user inputs.
**Note**: fields in \<brackets\> require user inputs.
Make sure you are in [this](./) directory.

## Setup

Expand All @@ -9,16 +10,19 @@
2. Specify the path to your Databricks workspace:
```shell
export WS_PATH=</Users/someone@example.com>
```

export NOTEBOOK_DEST=${WS_PATH}/spark-dl/notebook_torch.ipynb
export UTILS_DEST=${WS_PATH}/spark-dl/pytriton_utils.py
export INIT_DEST=${WS_PATH}/spark-dl/init_spark_dl.sh
```shell
export SPARK_DL_WS=${WS_PATH}/spark-dl
databricks workspace mkdirs ${SPARK_DL_WS}
```
3. Specify the local paths to the notebook you wish to run, the utils file, and the init script.
As an example for a PyTorch notebook:
```shell
export NOTEBOOK_SRC=</path/to/notebook_torch.ipynb>
export UTILS_SRC=</path/to/pytriton_utils.py>
```
```shell
export UTILS_SRC=$(realpath ../pytriton_utils.py)
export INIT_SRC=$(pwd)/setup/init_spark_dl.sh
```
4. Specify the framework to torch or tf, corresponding to the notebook you wish to run. Continuing with the PyTorch example:
Expand All @@ -29,9 +33,9 @@

5. Copy the files to the Databricks Workspace:
```shell
databricks workspace import $NOTEBOOK_DEST --format JUPYTER --file $NOTEBOOK_SRC
databricks workspace import $UTILS_DEST --format AUTO --file $UTILS_SRC
databricks workspace import $INIT_DEST --format AUTO --file $INIT_SRC
databricks workspace import ${SPARK_DL_WS}/notebook_torch.ipynb --format JUPYTER --file $NOTEBOOK_SRC
databricks workspace import ${SPARK_DL_WS}/pytriton_utils.py --format AUTO --file $UTILS_SRC
databricks workspace import ${SPARK_DL_WS}/init_spark_dl.sh --format AUTO --file $INIT_SRC
```

6. Launch the cluster with the provided script. By default the script will create a cluster with 4 A10 worker nodes and 1 A10 driver node. (Note that the script uses **Azure instances** by default; change as needed).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ if [[ "${FRAMEWORK}" == "torch" ]]; then
cat <<EOF > temp_requirements.txt
datasets==3.*
transformers
urllib3<2
nvidia-pytriton
torch<=2.5.1
torchvision --extra-index-url https://download.pytorch.org/whl/cu121
Expand All @@ -24,7 +23,6 @@ elif [[ "${FRAMEWORK}" == "tf" ]]; then
cat <<EOF > temp_requirements.txt
datasets==3.*
transformers
urllib3<2
nvidia-pytriton
EOF
else
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ json_config=$(cat <<EOF
"spark.python.worker.reuse": "true",
"spark.sql.execution.arrow.pyspark.enabled": "true",
"spark.task.resource.gpu.amount": "0.16667",
"spark.executor.cores": "6"
"spark.executor.cores": "12"
},
"node_type_id": "Standard_NV12ads_A10_v5",
"driver_node_type_id": "Standard_NV12ads_A10_v5",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

## Setup

**Note**: fields in \<brackets\> require user inputs.
**Note**: fields in \<brackets\> require user inputs.
Make sure you are in [this](./) directory.

#### Setup GCloud CLI

Expand Down Expand Up @@ -41,7 +42,7 @@
5. Copy the utils file to the GCS bucket.
```shell
gcloud storage cp </path/to/pytriton_utils.py> gs://${SPARK_DL_HOME}/
gcloud storage cp $(realpath ../pytriton_utils.py) gs://${SPARK_DL_HOME}/
```
#### Start cluster and run
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ scikit-learn
huggingface
datasets==3.*
transformers
urllib3<2
nvidia-pytriton"

TORCH_REQUIREMENTS="${COMMON_REQUIREMENTS}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,4 @@ huggingface
datasets
transformers
ipywidgets
urllib3<2
nvidia-pytriton

0 comments on commit 14e9bbc

Please sign in to comment.