diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/README.md b/examples/ML+DL-Examples/Spark-DL/dl_inference/README.md index 0286a97d..1609a0fd 100644 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/README.md +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/README.md @@ -69,19 +69,21 @@ Each notebook has a suffix `_torch` or `_tf` specifying the environment used. ``` conda create -n spark-dl-torch -c conda-forge python=3.11 conda activate spark-dl-torch +conda install -c conda-forge libstdcxx-ng pip install -r torch_requirements.txt ``` **For TensorFlow:** ``` conda create -n spark-dl-tf -c conda-forge python=3.11 conda activate spark-dl-tf +conda install -c conda-forge libstdcxx-ng pip install -r tf_requirements.txt ``` #### Start Cluster For demonstration, these instructions just use a local Standalone cluster with a single executor, but they can be run on any distributed Spark cluster. For cloud environments, see [below](#running-on-cloud-environments). - +If you haven't already, [install Spark](https://spark.apache.org/downloads.html) on your system. ```shell # Replace with your Spark installation path export SPARK_HOME= @@ -114,10 +116,6 @@ If you encounter issues starting the Triton server, you may need to link your li ```shell ln -sf /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ${CONDA_PREFIX}/lib/libstdc++.so.6 ``` -If the issue persists with the message `libstdc++.so.6: version 'GLIBCXX_3.4.30' not found`, you may need to update libstdc++ in your conda environment: -```shell -conda install -c conda-forge libstdcxx-ng -``` ## Running on Cloud Environments diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/README.md b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/README.md index cc760522..26edfb85 100644 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/README.md +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/README.md @@ -1,6 +1,7 @@ # Spark DL Inference on Databricks -**Note**: fields in \ require user inputs. +**Note**: fields in \ require user inputs. +Make sure you are in [this](./) directory. ## Setup @@ -9,16 +10,19 @@ 2. Specify the path to your Databricks workspace: ```shell export WS_PATH= + ``` - export NOTEBOOK_DEST=${WS_PATH}/spark-dl/notebook_torch.ipynb - export UTILS_DEST=${WS_PATH}/spark-dl/pytriton_utils.py - export INIT_DEST=${WS_PATH}/spark-dl/init_spark_dl.sh + ```shell + export SPARK_DL_WS=${WS_PATH}/spark-dl + databricks workspace mkdirs ${SPARK_DL_WS} ``` 3. Specify the local paths to the notebook you wish to run, the utils file, and the init script. As an example for a PyTorch notebook: ```shell export NOTEBOOK_SRC= - export UTILS_SRC= + ``` + ```shell + export UTILS_SRC=$(realpath ../pytriton_utils.py) export INIT_SRC=$(pwd)/setup/init_spark_dl.sh ``` 4. Specify the framework to torch or tf, corresponding to the notebook you wish to run. Continuing with the PyTorch example: @@ -29,9 +33,9 @@ 5. Copy the files to the Databricks Workspace: ```shell - databricks workspace import $NOTEBOOK_DEST --format JUPYTER --file $NOTEBOOK_SRC - databricks workspace import $UTILS_DEST --format AUTO --file $UTILS_SRC - databricks workspace import $INIT_DEST --format AUTO --file $INIT_SRC + databricks workspace import ${SPARK_DL_WS}/notebook_torch.ipynb --format JUPYTER --file $NOTEBOOK_SRC + databricks workspace import ${SPARK_DL_WS}/pytriton_utils.py --format AUTO --file $UTILS_SRC + databricks workspace import ${SPARK_DL_WS}/init_spark_dl.sh --format AUTO --file $INIT_SRC ``` 6. Launch the cluster with the provided script. By default the script will create a cluster with 4 A10 worker nodes and 1 A10 driver node. (Note that the script uses **Azure instances** by default; change as needed). diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/init_spark_dl.sh b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/init_spark_dl.sh index 9515f435..e8e60c93 100755 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/init_spark_dl.sh +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/init_spark_dl.sh @@ -10,7 +10,6 @@ if [[ "${FRAMEWORK}" == "torch" ]]; then cat < temp_requirements.txt datasets==3.* transformers -urllib3<2 nvidia-pytriton torch<=2.5.1 torchvision --extra-index-url https://download.pytorch.org/whl/cu121 @@ -24,7 +23,6 @@ elif [[ "${FRAMEWORK}" == "tf" ]]; then cat < temp_requirements.txt datasets==3.* transformers -urllib3<2 nvidia-pytriton EOF else diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/start_cluster.sh b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/start_cluster.sh index 457b080b..7b37efc4 100755 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/start_cluster.sh +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/databricks/setup/start_cluster.sh @@ -26,7 +26,7 @@ json_config=$(cat < require user inputs. +**Note**: fields in \ require user inputs. +Make sure you are in [this](./) directory. #### Setup GCloud CLI @@ -41,7 +42,7 @@ 5. Copy the utils file to the GCS bucket. ```shell - gcloud storage cp gs://${SPARK_DL_HOME}/ + gcloud storage cp $(realpath ../pytriton_utils.py) gs://${SPARK_DL_HOME}/ ``` #### Start cluster and run diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/dataproc/setup/start_cluster.sh b/examples/ML+DL-Examples/Spark-DL/dl_inference/dataproc/setup/start_cluster.sh index 35840bae..649a4805 100755 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/dataproc/setup/start_cluster.sh +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/dataproc/setup/start_cluster.sh @@ -45,7 +45,6 @@ scikit-learn huggingface datasets==3.* transformers -urllib3<2 nvidia-pytriton" TORCH_REQUIREMENTS="${COMMON_REQUIREMENTS} diff --git a/examples/ML+DL-Examples/Spark-DL/dl_inference/requirements.txt b/examples/ML+DL-Examples/Spark-DL/dl_inference/requirements.txt index a0afb217..44c223f1 100644 --- a/examples/ML+DL-Examples/Spark-DL/dl_inference/requirements.txt +++ b/examples/ML+DL-Examples/Spark-DL/dl_inference/requirements.txt @@ -26,5 +26,4 @@ huggingface datasets transformers ipywidgets -urllib3<2 nvidia-pytriton