Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support self-hosted embedding service via BentoML #324

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,12 @@ GOOGLE_CSE_ID=
# Miscellaneous options
# Skip loading Chroma.
OVERWRITE_CHROMA=true

# Enable SentenceEmbedding model served via BentoML
# For local embedding service, use:
# docker run --rm -p 3001:3001 ghcr.io/bentoml/sentence-embedding-bento:latest --port 3001
# Then set the following env var:
# BENTOML_EMBEDDING_ENDPOINT=http://localhost:3001
# Instructions for customizing your embedding model server: https://github.com/bentoml/sentence-embedding-bento
BENTOML_EMBEDDING_ENDPOINT=

22 changes: 21 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ __Demo settings: Web, GPT4, ElevenLabs with voice clone, Chroma, Google Speech t

- ✅**Web**: [React JS](https://react.dev/), [Vanilla JS](http://vanilla-js.com/), [WebSockets](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API)
- ✅**Mobile**: [Swift](https://developer.apple.com/swift/), [WebSockets](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API)
- ✅**Backend**: [FastAPI](https://fastapi.tiangolo.com/), [SQLite](https://www.sqlite.org/index.html), [Docker](https://www.docker.com/)
- ✅**Backend**: [FastAPI](https://fastapi.tiangolo.com/), [SQLite](https://www.sqlite.org/index.html), [Docker](https://www.docker.com/), [BentoML](https://bentoml.com/)
- ✅**Data Ingestion**: [LlamaIndex](https://www.llamaindex.ai/), [Chroma](https://www.trychroma.com/)
- ✅**LLM Orchestration**: [LangChain](https://langchain.com/), [Chroma](https://www.trychroma.com/)
- ✅**LLM**: [OpenAI GPT3.5/4](https://platform.openai.com/docs/api-reference/chat), [Anthropic Claude 2](https://docs.anthropic.com/claude/docs/getting-started-with-claude)
Expand Down Expand Up @@ -159,6 +159,26 @@ ELEVEN_LABS_API_KEY=<api key>
```
</details>

### 4. (Optional) Prepare self-hosted embedding service - BentoML Deployment Endpoint
<details><summary>👇click me</summary>

1. Install [Docker](https://docs.docker.com/engine/install/)

2. Run the text embedding service docker image generated with BentoML:

```bash
docker run --rm -p 3001:3001 ghcr.io/bentoml/sentence-embedding-bento:latest --port 3001
```

3. Set the Text Embedding Endpoint in your .env file:

```
BENTOML_EMBEDDING_ENDPOINT=http://localhost:3001
```

For cloud deployment options and customizing your own embeddding model, check out the source repo [here](https://github.com/bentoml/sentence-embedding-bento)
</details>

## 💿 Installation via Python
- **Step 1**. Clone the repo
```sh
Expand Down
7 changes: 7 additions & 0 deletions cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,17 @@ def image_exists(name):
return result.returncode == 0


@click.command(help="Run BentoML text embedding service locally via Docker at localhost:3000")
def run_embedding_service():
click.secho("Launching BentoML SentenceEmbedding Service...", fg='green')
subprocess.run(["docker", "run", "--rm", "-p", "3001:3001", "ghcr.io/bentoml/sentence-embedding-bento:latest"])


cli.add_command(docker_build)
cli.add_command(docker_run)
cli.add_command(docker_delete)
cli.add_command(run_uvicorn)
cli.add_command(run_embedding_service)
cli.add_command(web_build)
cli.add_command(docker_next_web_build)

Expand Down
35 changes: 30 additions & 5 deletions realtime_ai_character/database/chroma.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,47 @@
import os
from dotenv import load_dotenv
from bentoml.client import Client
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.embeddings.base import Embeddings
from realtime_ai_character.logger import get_logger

load_dotenv()
logger = get_logger(__name__)

embedding = OpenAIEmbeddings(openai_api_key=os.getenv("OPENAI_API_KEY"))
if os.getenv('OPENAI_API_TYPE') == 'azure':
embedding = OpenAIEmbeddings(openai_api_key=os.getenv("OPENAI_API_KEY"), deployment=os.getenv(
"OPENAI_API_EMBEDDING_DEPLOYMENT_NAME", "text-embedding-ada-002"), chunk_size=1)

class BentoEmbeddings(Embeddings):
def __init__(self, embedding_svc_client: Client):
self.client = embedding_svc_client

def embed_documents(self, texts: list[str]) -> list[list[float]]:
return self.client.encode(texts).tolist()

def embed_query(self, text: str) -> list[float]:
return self.client.encode([text]).tolist()[0]


embedding_endpoint = os.getenv("BENTOML_EMBEDDING_ENDPOINT")

if embedding_endpoint:
# Use self-hosted embedding model via BentoML API endpoint
client = Client.from_url(embedding_endpoint)
embedding_func = BentoEmbeddings(client)
else:
embedding_func = OpenAIEmbeddings(openai_api_key=os.getenv("OPENAI_API_KEY"))
if os.getenv('OPENAI_API_TYPE') == 'azure':
embedding_func = OpenAIEmbeddings(
openai_api_key=os.getenv("OPENAI_API_KEY"),
deployment=os.getenv(
"OPENAI_API_EMBEDDING_DEPLOYMENT_NAME", "text-embedding-ada-002"
),
chunk_size=1)


def get_chroma():
chroma = Chroma(
collection_name='llm',
embedding_function=embedding,
embedding_function=embedding_func,
persist_directory='./chroma.db'
)
return chroma
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ aioconsole
aiofiles
alembic
anthropic
bentoml>=1.1
chromadb>=0.4.2
click
EbookLib
Expand Down
2 changes: 2 additions & 0 deletions sample_cloud_deployment/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ spec:
value: <YOUR_AI_VOICE_ID>
- name: BRUCE_VOICE
value: <YOUR_AI_VOICE_ID>
- name: BENTOML_EMBEDDING_ENDPOINT
value: bentoml-embedding-service.<YOUR_DEPLOYMENT_NAMESPACE>.svc.cluster.local
---
apiVersion: v1
kind: Service
Expand Down
41 changes: 41 additions & 0 deletions sample_cloud_deployment/embedding_service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# For advanced BentoML deployment on kubernetes, see:
# https://www.kubeflow.org/docs/external-add-ons/serving/bentoml/
# https://github.com/bentoml/yatai
apiVersion: apps/v1
kind: Deployment
metadata:
name: bentoml-embedding-deployment
labels:
app: bentoml-text-embedding
spec:
replicas: 1
selector:
matchLabels:
app: bentoml-text-embedding
template:
metadata:
labels:
app: bentoml-text-embedding
spec:
containers:
- name: bentoml-text-embedding
image: ghcr.io/bentoml/sentence-embedding-bento:0.1.0
ports:
- containerPort: 3000
env:
- name: BENTOML_CONFIG_OPTIONS
value: "api_server.metrics.namespace=realchar,api_server.traffic.timeout=10"
---
apiVersion: v1
kind: Service
metadata:
name: bentoml-embedding-service
spec:
type: ClusterIP
selector:
app: bentoml-text-embedding
ports:
- protocol: TCP
port: 80
targetPort: 3000