Skip to content

Commit

Permalink
Refactor codegen codetrans faqgen visualqna and guardrails-usvc (#710)
Browse files Browse the repository at this point in the history
* Adapt guardrails-usvc
* Refactor e2e chart: codegen codetrans faqgen visualqna
* llm-uservice: Adapt to API change

Signed-off-by: Lianhao Lu <lianhao.lu@intel.com>
  • Loading branch information
lianhao authored Jan 17, 2025
1 parent ab51131 commit e99c965
Show file tree
Hide file tree
Showing 14 changed files with 142 additions and 46 deletions.
5 changes: 5 additions & 0 deletions helm-charts/codegen/gaudi-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ tgi:
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
CUDA_GRAPHS: ""
OMPI_MCA_btl_vader_single_copy_mechanism: "none"
ENABLE_HPU_GRAPH: "true"
LIMIT_HPU_GRAPH: "true"
USE_FLASH_ATTENTION: "true"
FLASH_ATTENTION_RECOMPUTE: "true"
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 5
Expand Down
3 changes: 3 additions & 0 deletions helm-charts/codegen/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ affinity: {}
tgi:
LLM_MODEL_ID: Qwen/Qwen2.5-Coder-7B-Instruct

llm-uservice:
LLM_MODEL_ID: Qwen/Qwen2.5-Coder-7B-Instruct

nginx:
service:
type: NodePort
Expand Down
5 changes: 5 additions & 0 deletions helm-charts/codetrans/gaudi-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ tgi:
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
CUDA_GRAPHS: ""
OMPI_MCA_btl_vader_single_copy_mechanism: "none"
ENABLE_HPU_GRAPH: "true"
LIMIT_HPU_GRAPH: "true"
USE_FLASH_ATTENTION: "true"
FLASH_ATTENTION_RECOMPUTE: "true"
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 5
Expand Down
3 changes: 3 additions & 0 deletions helm-charts/codetrans/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ affinity: {}
tgi:
LLM_MODEL_ID: mistralai/Mistral-7B-Instruct-v0.3

llm-uservice:
LLM_MODEL_ID: mistralai/Mistral-7B-Instruct-v0.3

nginx:
service:
type: NodePort
Expand Down
53 changes: 34 additions & 19 deletions helm-charts/common/guardrails-usvc/README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,54 @@
# guardrails-usvc

Helm chart for deploying LLM microservice.
Helm chart for deploying Guardrails microservice.

guardrails-usvc depends on TGI, you should set TGI_LLM_ENDPOINT as tgi endpoint.
## Installing the chart

## (Option1): Installing the chart separately
`guardrails-usvc` depends on the following inference backend services:

First, you need to install the tgi chart, please refer to the [tgi](../tgi) chart for more information. Please use model `meta-llama/Meta-Llama-Guard-2-8B` during installation.
- TGI: please refer to [tgi](../tgi) chart for more information

After you've deployted the tgi chart successfully, please run `kubectl get svc` to get the tgi service endpoint, i.e. `http://tgi`.
### Use Meta Llama Guard models(default):

To install the chart, run the following:
First, you need to install `tgi` helm chart using the model `meta-llama/Meta-Llama-Guard-2-8B`.

After you've deployed the dependent chart successfully, please run `kubectl get svc` to get the backend inference service endpoint, e.g. `http://tgi`.

To install the `guardrails-usvc` chart, run the following:

```console
cd GenAIInfra/helm-charts/common/guardrails-usvc
helm dependency update
export HFTOKEN="insert-your-huggingface-token-here"
export SAFETY_GUARD_ENDPOINT="http://tgi"
export SAFETY_GUARD_MODEL_ID="meta-llama/Meta-Llama-Guard-2-8B"
helm dependency update
helm install guardrails-usvc . --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set SAFETY_GUARD_ENDPOINT=${SAFETY_GUARD_ENDPOINT} --set SAFETY_GUARD_MODEL_ID=${SAFETY_GUARD_MODEL_ID} --wait
export GUARDRAILS_BACKEND="LLAMA"
helm install guardrails-usvc . --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set SAFETY_GUARD_ENDPOINT=${SAFETY_GUARD_ENDPOINT} --set SAFETY_GUARD_MODEL_ID=${SAFETY_GUARD_MODEL_ID} --set GUARDRAILS_BACKEND=${GUARDRAILS_BACKEND} --wait
```

## (Option2): Installing the chart with dependencies automatically
### Use Allen Institute AI's WildGuard models:

First, you need to install `tgi` helm chart using the model `allenai/wildguard`.

After you've deployed the dependent chart successfully, please run `kubectl get svc` to get the backend inference service endpoint, e.g. `http://tgi`.

To install the `guardrails-usvc` chart, run the following:

```console
cd GenAIInfra/helm-charts/common/guardrails-usvc
export HFTOKEN="insert-your-huggingface-token-here"
helm dependency update
helm install guardrails-usvc . --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set tgi-guardrails.enabled=true --wait
export HFTOKEN="insert-your-huggingface-token-here"
export SAFETY_GUARD_ENDPOINT="http://tgi"
export SAFETY_GUARD_MODEL_ID="allenai/wildguard"
export GUARDRAILS_BACKEND="WILD"
helm install guardrails-usvc . --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set SAFETY_GUARD_ENDPOINT=${SAFETY_GUARD_ENDPOINT} --set SAFETY_GUARD_MODEL_ID=${SAFETY_GUARD_MODEL_ID} --set GUARDRAILS_BACKEND=${GUARDRAILS_BACKEND} --wait
```

## Verify

To verify the installation, run the command `kubectl get pod` to make sure all pods are running.

Then run the command `kubectl port-forward svc/guardrails-usvc 9090:9090` to expose the llm-uservice service for access.
Then run the command `kubectl port-forward svc/guardrails-usvc 9090:9090` to expose the guardrails-usvc service for access.

Open another terminal and run the following command to verify the service if working:

Expand All @@ -47,10 +61,11 @@ curl http://localhost:9090/v1/guardrails \

## Values

| Key | Type | Default | Description |
| ------------------------------- | ------ | ------------------------------------ | ------------------------------------------------ |
| global.HUGGINGFACEHUB_API_TOKEN | string | `""` | Your own Hugging Face API token |
| image.repository | string | `"opea/guardrails-usvc"` | |
| service.port | string | `"9090"` | |
| SAFETY_GUARD_ENDPOINT | string | `""` | LLM endpoint |
| SAFETY_GUARD_MODEL_ID | string | `"meta-llama/Meta-Llama-Guard-2-8B"` | Model ID for the underlying LLM service is using |
| Key | Type | Default | Description |
| ------------------------------- | ------ | ------------------------------------ | --------------------------------------------------------------- |
| global.HUGGINGFACEHUB_API_TOKEN | string | `""` | Your own Hugging Face API token |
| image.repository | string | `"opea/guardrails-usvc"` | |
| service.port | string | `"9090"` | |
| SAFETY_GUARD_ENDPOINT | string | `""` | LLM endpoint |
| SAFETY_GUARD_MODEL_ID | string | `"meta-llama/Meta-Llama-Guard-2-8B"` | Model ID for the underlying LLM service is using |
| GUARDRAIL_BACKEND | string | `"LLAMA"` | different gaurdrail model family to use, one of `LLAMA`, `WILD` |
7 changes: 7 additions & 0 deletions helm-charts/common/guardrails-usvc/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@ data:
SAFETY_GUARD_ENDPOINT: "http://{{ .Release.Name }}-tgi-guardrails"
{{- end }}
SAFETY_GUARD_MODEL_ID: {{ .Values.SAFETY_GUARD_MODEL_ID | quote }}
{{- if eq "LLAMA" .Values.GUARDRAIL_BACKEND }}
GUARDRAILS_COMPONENT_NAME: "OPEA_LLAMA_GUARD"
{{- else if eq "WILD" .Values.GUARDRAIL_BACKEND }}
GUARDRAILS_COMPONENT_NAME: "OPEA_WILD_GUARD"
{{- else }}
{{- cat "Invalid GUARDRAIL_BACKEND:" .Values.GUARDRAIL_BACKEND | fail }}
{{- end }}
HUGGINGFACEHUB_API_TOKEN: {{ .Values.global.HUGGINGFACEHUB_API_TOKEN | quote}}
HF_HOME: "/tmp/.cache/huggingface"
LOGFLAG: {{ .Values.LOGFLAG | quote }}
Expand Down
32 changes: 31 additions & 1 deletion helm-charts/common/guardrails-usvc/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,38 @@ spec:
serviceAccountName: {{ include "guardrails-usvc.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
initContainers:
- name: wait-for-llm
envFrom:
- configMapRef:
name: {{ include "guardrails-usvc.fullname" . }}-config
{{- if .Values.global.extraEnvConfig }}
- configMapRef:
name: {{ .Values.global.extraEnvConfig }}
optional: true
{{- end }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: busybox:1.36
command: ["sh", "-c"]
args:
- |
proto=$(echo ${SAFETY_GUARD_ENDPOINT} | sed -n 's/.*\(http[s]\?\):\/\/\([^ :]\+\):\?\([0-9]*\).*/\1/p');
host=$(echo ${SAFETY_GUARD_ENDPOINT} | sed -n 's/.*\(http[s]\?\):\/\/\([^ :]\+\):\?\([0-9]*\).*/\2/p');
port=$(echo ${SAFETY_GUARD_ENDPOINT} | sed -n 's/.*\(http[s]\?\):\/\/\([^ :]\+\):\?\([0-9]*\).*/\3/p');
if [ -z "$port" ]; then
port=80;
[[ "$proto" = "https" ]] && port=443;
fi;
retry_count={{ .Values.retryCount | default 60 }};
j=1;
while ! nc -z ${host} ${port}; do
[[ $j -ge ${retry_count} ]] && echo "ERROR: ${host}:${port} is NOT reachable in $j seconds!" && exit 1;
j=$((j+1)); sleep 1;
done;
echo "${host}:${port} is reachable within $j seconds.";
containers:
- name: {{ .Release.Name }}
- name: {{ .Chart.Name }}
envFrom:
- configMapRef:
name: {{ include "guardrails-usvc.fullname" . }}-config
Expand Down
43 changes: 22 additions & 21 deletions helm-charts/common/guardrails-usvc/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,29 @@
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

tgi-guardrails:
enabled: false
LLM_MODEL_ID: "meta-llama/Meta-Llama-Guard-2-8B"
# Configurations for OPEA microservice guardrails-usvc
# Set it as a non-null string, such as true, if you want to enable logging.
LOGFLAG: ""

replicaCount: 1
# settings for guardrails service
# guardrail model family to use:
# default is Meta's Llama Guard
GUARDRAIL_BACKEND: "LLAMA"
# Guard Model ID, should be same as the TGI's LLM_MODEL_ID
SAFETY_GUARD_MODEL_ID: "meta-llama/Meta-Llama-Guard-2-8B"

# Uncomment and set the following settings to use Allen Institute AI's WildGuard
# GUARDRAIL_BACKEND: "WILD"
# Guard Model ID, should be same as the TGI's LLM_MODEL_ID
# SAFETY_GUARD_MODEL_ID: "allenai/wildguard"

# TGI service endpoint
SAFETY_GUARD_ENDPOINT: ""
# Guard Model Id
SAFETY_GUARD_MODEL_ID: "meta-llama/Meta-Llama-Guard-2-8B"
# Set it as a non-null string, such as true, if you want to enable logging facility,
# otherwise, keep it as "" to disable it.
LOGFLAG: ""

replicaCount: 1

image:
repository: opea/guardrails-tgi
repository: opea/guardrails
# Uncomment the following line to set desired image pull policy if needed, as one of Always, IfNotPresent, Never.
# pullPolicy: ""
# Overrides the image tag whose default is the chart appVersion.
Expand Down Expand Up @@ -62,24 +69,13 @@ service:
port: 9090

resources:
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
requests:
cpu: 100m
memory: 128Mi

livenessProbe:
httpGet:
path: v1/health_check
port: guardrails-usvc
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 24
readinessProbe:
httpGet:
path: v1/health_check
Expand Down Expand Up @@ -109,3 +105,8 @@ global:
# If set, it will overwrite serviceAccount.name.
# If set, and serviceAccount.create is false, it will assume this service account is already created by others.
sharedSAName: ""

# for CI tests only
tgi-guardrails:
enabled: false
LLM_MODEL_ID: "meta-llama/Meta-Llama-Guard-2-8B"
4 changes: 2 additions & 2 deletions helm-charts/common/llm-uservice/templates/tests/test-pod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ spec:
- |
{{- if contains "llm-docsum" .Values.image.repository }}
url="http://{{ include "llm-uservice.fullname" . }}:{{ .Values.service.port }}/v1/docsum";
body='{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":17}';
body='{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":17}';
{{- else if contains "llm-faqgen" .Values.image.repository }}
url="http://{{ include "llm-uservice.fullname" . }}:{{ .Values.service.port }}/v1/faqgen";
body='{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":17}';
body='{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":17}';
{{- else }}
url="http://{{ include "llm-uservice.fullname" . }}:{{ .Values.service.port }}/v1/chat/completions";
body='{"model": "{{ .Values.LLM_MODEL_ID }}", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}';
Expand Down
6 changes: 6 additions & 0 deletions helm-charts/common/tgi/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,9 @@ data:
{{- if .Values.FLASH_ATTENTION_RECOMPUTE }}
FLASH_ATTENTION_RECOMPUTE: {{ .Values.FLASH_ATTENTION_RECOMPUTE | quote }}
{{- end }}
{{- if .Values.PREFILL_BATCH_BUCKET_SIZE }}
PREFILL_BATCH_BUCKET_SIZE: {{ .Values.PREFILL_BATCH_BUCKET_SIZE | quote }}
{{- end }}
{{- if .Values.BATCH_BUCKET_SIZE }}
BATCH_BUCKET_SIZE: {{ .Values.BATCH_BUCKET_SIZE | quote }}
{{- end }}
16 changes: 14 additions & 2 deletions helm-charts/faqgen/gaudi-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,21 @@ tgi:
resources:
limits:
habana.ai/gaudi: 1
MAX_INPUT_LENGTH: "4096"
MAX_TOTAL_TOKENS: "8192"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
CUDA_GRAPHS: "0"
OMPI_MCA_btl_vader_single_copy_mechanism: "none"
ENABLE_HPU_GRAPH: "true"
LIMIT_HPU_GRAPH: "true"
USE_FLASH_ATTENTION: "true"
FLASH_ATTENTION_RECOMPUTE: "true"
PREFILL_BATCH_BUCKET_SIZE: 1
BATCH_BUCKET_SIZE: 8
extraCmdArgs:
- "--max-batch-total-tokens"
- "65536"
- "--max-batch-prefill-tokens"
- "4096"
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 5
Expand Down
3 changes: 2 additions & 1 deletion helm-charts/faqgen/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ affinity: {}
# To override values in subchart llm-uservice
llm-uservice:
image:
repository: opea/llm-faqgen-tgi
repository: opea/llm-faqgen
LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct

# To override values in subchart tgi
tgi:
Expand Down
5 changes: 5 additions & 0 deletions helm-charts/visualqna/gaudi-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ tgi:
MAX_INPUT_LENGTH: "4096"
MAX_TOTAL_TOKENS: "8192"
CUDA_GRAPHS: ""
OMPI_MCA_btl_vader_single_copy_mechanism: "none"
ENABLE_HPU_GRAPH: "true"
LIMIT_HPU_GRAPH: "true"
USE_FLASH_ATTENTION: "true"
FLASH_ATTENTION_RECOMPUTE: "true"
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 5
Expand Down
3 changes: 3 additions & 0 deletions helm-charts/visualqna/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ tgi:
MAX_TOTAL_TOKENS: "8192"
LLM_MODEL_ID: llava-hf/llava-v1.6-mistral-7b-hf

lvm-uservice:
LVM_BACKEND: "TGI"

nginx:
service:
type: NodePort
Expand Down

0 comments on commit e99c965

Please sign in to comment.