You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have deployed Llama-3-8B model compiled with TRT-LLM backend with all the default parameters and used inflight-batcher to create model repository. Was able to serve and get response on local deployment.
Then deployed the same model repo. on vertex AI, and when tested a sample payload as below:
# Use the official Triton Inference Server image with TensorRT-LLM support
FROM nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
# Set the working directory inside the container
WORKDIR /app
# Install necessary dependencies
RUN apt-get update && \
apt-get upgrade -y
# Install Google Cloud SDK for gsutil
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | tee /etc/apt/trusted.gpg.d/google.asc
RUN echo "deb https://packages.cloud.google.com/apt cloud-sdk main" | tee /etc/apt/sources.list.d/google-cloud-sdk.list
RUN apt-get update && apt-get install -y google-cloud-sdk
RUN rm -rf /usr/lib/google-cloud-sdk/bin/anthoscli
# Expose Triton's default HTTP port
EXPOSE 8000 8080
# Set environment variables for runtime configuration
ENV OMPI_ALLOW_RUN_AS_ROOT=1
ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
# Now you can access your GCS bucket without needing a service account key
RUN gsutil -m cp -r gs://llama-model-bucket/Meta-Llama-3-8B-Instruct /app/ && \
gsutil -m cp -r gs://llama-model-bucket/triton_model_repo /app/
# Run Triton Inference Server on container start
ENTRYPOINT ["mpirun", "-n", "1"]
CMD [ "tritonserver", "--model-repository=/app/triton_model_repo/", "--vertex-ai-default-model=ensemble"]
I have deployed Llama-3-8B model compiled with TRT-LLM backend with all the default parameters and used inflight-batcher to create model repository. Was able to serve and get response on local deployment.
Then deployed the same model repo. on vertex AI, and when tested a sample payload as below:
It throws error as :
Below is my config.pbxt :
The text was updated successfully, but these errors were encountered: