Skip to content

Commit 5caa381

Browse files
bovlbgsaluja9
andauthored
community[minor]: Add ApertureDB as a vectorstore (langchain-ai#24088)
Thank you for contributing to LangChain! - [X] *ApertureDB as vectorstore**: "community: Add ApertureDB as a vectorestore" - **Description:** this change provides a new community integration that uses ApertureData's ApertureDB as a vector store. - **Issue:** none - **Dependencies:** depends on ApertureDB Python SDK - **Twitter handle:** ApertureData - [X] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Integration tests rely on a local run of a public docker image. Example notebook additionally relies on a local Ollama server. - [X] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ All lint tests pass. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Gautam <gautam@aperturedata.io>
1 parent c59e663 commit 5caa381

File tree

8 files changed

+871
-4
lines changed

8 files changed

+871
-4
lines changed

.devcontainer/docker-compose.yaml

+2-4
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ services:
55
dockerfile: libs/langchain/dev.Dockerfile
66
context: ..
77
volumes:
8-
# Update this to wherever you want VS Code to mount the folder of your project
8+
# Update this to wherever you want VS Code to mount the folder of your project
99
- ..:/workspaces/langchain:cached
1010
networks:
11-
- langchain-network
11+
- langchain-network
1212
# environment:
1313
# MONGO_ROOT_USERNAME: root
1414
# MONGO_ROOT_PASSWORD: example123
@@ -28,5 +28,3 @@ services:
2828
networks:
2929
langchain-network:
3030
driver: bridge
31-
32-
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "683953b3",
6+
"metadata": {},
7+
"source": [
8+
"# ApertureDB\n",
9+
"\n",
10+
"[ApertureDB](https://docs.aperturedata.io) is a database that stores, indexes, and manages multi-modal data like text, images, videos, bounding boxes, and embeddings, together with their associated metadata.\n",
11+
"\n",
12+
"This notebook explains how to use the embeddings functionality of ApertureDB."
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"id": "e7393beb",
18+
"metadata": {},
19+
"source": [
20+
"## Install ApertureDB Python SDK\n",
21+
"\n",
22+
"This installs the [Python SDK](https://docs.aperturedata.io/category/aperturedb-python-sdk) used to write client code for ApertureDB."
23+
]
24+
},
25+
{
26+
"cell_type": "code",
27+
"execution_count": 1,
28+
"id": "a62cff8a-bcf7-4e33-bbbc-76999c2e3e20",
29+
"metadata": {
30+
"tags": []
31+
},
32+
"outputs": [
33+
{
34+
"name": "stdout",
35+
"output_type": "stream",
36+
"text": [
37+
"Note: you may need to restart the kernel to use updated packages.\n"
38+
]
39+
}
40+
],
41+
"source": [
42+
"%pip install --upgrade --quiet aperturedb"
43+
]
44+
},
45+
{
46+
"cell_type": "markdown",
47+
"id": "4fe12f77",
48+
"metadata": {},
49+
"source": [
50+
"## Run an ApertureDB instance\n",
51+
"\n",
52+
"To continue, you should have an [ApertureDB instance up and running](https://docs.aperturedata.io/HowToGuides/start/Setup) and configure your environment to use it. \n",
53+
"There are various ways to do that, for example:\n",
54+
"\n",
55+
"```bash\n",
56+
"docker run --publish 55555:55555 aperturedata/aperturedb-standalone\n",
57+
"adb config create local --active --no-interactive\n",
58+
"```"
59+
]
60+
},
61+
{
62+
"cell_type": "markdown",
63+
"id": "667eabca",
64+
"metadata": {},
65+
"source": [
66+
"## Download some web documents\n",
67+
"We're going to do a mini-crawl here of one web page."
68+
]
69+
},
70+
{
71+
"cell_type": "code",
72+
"execution_count": 2,
73+
"id": "0798dfdb",
74+
"metadata": {},
75+
"outputs": [
76+
{
77+
"name": "stderr",
78+
"output_type": "stream",
79+
"text": [
80+
"USER_AGENT environment variable not set, consider setting it to identify your requests.\n"
81+
]
82+
}
83+
],
84+
"source": [
85+
"# For loading documents from web\n",
86+
"from langchain_community.document_loaders import WebBaseLoader\n",
87+
"\n",
88+
"loader = WebBaseLoader(\"https://docs.aperturedata.io\")\n",
89+
"docs = loader.load()"
90+
]
91+
},
92+
{
93+
"cell_type": "markdown",
94+
"id": "5f077d11",
95+
"metadata": {},
96+
"source": [
97+
"## Select embeddings model\n",
98+
"\n",
99+
"We want to use OllamaEmbeddings so we have to import the necessary modules.\n",
100+
"\n",
101+
"Ollama can be set up as a docker container as described in the [documentation](https://hub.docker.com/r/ollama/ollama), for example:\n",
102+
"```bash\n",
103+
"# Run server\n",
104+
"docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama\n",
105+
"# Tell server to load a specific model\n",
106+
"docker exec ollama ollama run llama2\n",
107+
"```"
108+
]
109+
},
110+
{
111+
"cell_type": "code",
112+
"execution_count": 3,
113+
"id": "8b6ed9cd-81b9-46e5-9c20-5aafca2844d0",
114+
"metadata": {
115+
"tags": []
116+
},
117+
"outputs": [],
118+
"source": [
119+
"from langchain_community.embeddings import OllamaEmbeddings\n",
120+
"\n",
121+
"embeddings = OllamaEmbeddings()"
122+
]
123+
},
124+
{
125+
"cell_type": "markdown",
126+
"id": "b7b313e6",
127+
"metadata": {},
128+
"source": [
129+
"## Split documents into segments\n",
130+
"\n",
131+
"We want to turn our single document into multiple segments."
132+
]
133+
},
134+
{
135+
"cell_type": "code",
136+
"execution_count": 4,
137+
"id": "3c4b7b31",
138+
"metadata": {},
139+
"outputs": [],
140+
"source": [
141+
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
142+
"\n",
143+
"text_splitter = RecursiveCharacterTextSplitter()\n",
144+
"documents = text_splitter.split_documents(docs)"
145+
]
146+
},
147+
{
148+
"cell_type": "markdown",
149+
"id": "46339d32",
150+
"metadata": {},
151+
"source": [
152+
"## Create vectorstore from documents and embeddings\n",
153+
"\n",
154+
"This code creates a vectorstore in the ApertureDB instance.\n",
155+
"Within the instance, this vectorstore is represented as a \"[descriptor set](https://docs.aperturedata.io/category/descriptorset-commands)\".\n",
156+
"By default, the descriptor set is named `langchain`. The following code will generate embeddings for each document and store them in ApertureDB as descriptors. This will take a few seconds as the embeddings are bring generated."
157+
]
158+
},
159+
{
160+
"cell_type": "code",
161+
"execution_count": 5,
162+
"id": "dcf88bdf",
163+
"metadata": {
164+
"tags": []
165+
},
166+
"outputs": [],
167+
"source": [
168+
"from langchain_community.vectorstores import ApertureDB\n",
169+
"\n",
170+
"vector_db = ApertureDB.from_documents(documents, embeddings)"
171+
]
172+
},
173+
{
174+
"cell_type": "markdown",
175+
"id": "7672877b",
176+
"metadata": {},
177+
"source": [
178+
"## Select a large language model\n",
179+
"\n",
180+
"Again, we use the Ollama server we set up for local processing."
181+
]
182+
},
183+
{
184+
"cell_type": "code",
185+
"execution_count": 6,
186+
"id": "9a005e4b",
187+
"metadata": {},
188+
"outputs": [],
189+
"source": [
190+
"from langchain_community.llms import Ollama\n",
191+
"\n",
192+
"llm = Ollama(model=\"llama2\")"
193+
]
194+
},
195+
{
196+
"cell_type": "markdown",
197+
"id": "cd54f2ad",
198+
"metadata": {},
199+
"source": [
200+
"## Build a RAG chain\n",
201+
"\n",
202+
"Now we have all the components we need to create a RAG (Retrieval-Augmented Generation) chain. This chain does the following:\n",
203+
"1. Generate embedding descriptor for user query\n",
204+
"2. Find text segments that are similar to the user query using the vector store\n",
205+
"3. Pass user query and context documents to the LLM using a prompt template\n",
206+
"4. Return the LLM's answer"
207+
]
208+
},
209+
{
210+
"cell_type": "code",
211+
"execution_count": 7,
212+
"id": "a8c513ab",
213+
"metadata": {},
214+
"outputs": [
215+
{
216+
"name": "stdout",
217+
"output_type": "stream",
218+
"text": [
219+
"Based on the provided context, ApertureDB can store images. In fact, it is specifically designed to manage multimodal data such as images, videos, documents, embeddings, and associated metadata including annotations. So, ApertureDB has the capability to store and manage images.\n"
220+
]
221+
}
222+
],
223+
"source": [
224+
"# Create prompt\n",
225+
"from langchain_core.prompts import ChatPromptTemplate\n",
226+
"\n",
227+
"prompt = ChatPromptTemplate.from_template(\"\"\"Answer the following question based only on the provided context:\n",
228+
"\n",
229+
"<context>\n",
230+
"{context}\n",
231+
"</context>\n",
232+
"\n",
233+
"Question: {input}\"\"\")\n",
234+
"\n",
235+
"\n",
236+
"# Create a chain that passes documents to an LLM\n",
237+
"from langchain.chains.combine_documents import create_stuff_documents_chain\n",
238+
"\n",
239+
"document_chain = create_stuff_documents_chain(llm, prompt)\n",
240+
"\n",
241+
"\n",
242+
"# Treat the vectorstore as a document retriever\n",
243+
"retriever = vector_db.as_retriever()\n",
244+
"\n",
245+
"\n",
246+
"# Create a RAG chain that connects the retriever to the LLM\n",
247+
"from langchain.chains import create_retrieval_chain\n",
248+
"\n",
249+
"retrieval_chain = create_retrieval_chain(retriever, document_chain)"
250+
]
251+
},
252+
{
253+
"cell_type": "markdown",
254+
"id": "3bc6a882",
255+
"metadata": {},
256+
"source": [
257+
"## Run the RAG chain\n",
258+
"\n",
259+
"Finally we pass a question to the chain and get our answer. This will take a few seconds to run as the LLM generates an answer from the query and context documents."
260+
]
261+
},
262+
{
263+
"cell_type": "code",
264+
"execution_count": 9,
265+
"id": "020f29f1",
266+
"metadata": {},
267+
"outputs": [
268+
{
269+
"name": "stdout",
270+
"output_type": "stream",
271+
"text": [
272+
"Based on the provided context, ApertureDB can store images in several ways:\n",
273+
"\n",
274+
"1. Multimodal data management: ApertureDB offers a unified interface to manage multimodal data such as images, videos, documents, embeddings, and associated metadata including annotations. This means that images can be stored along with other types of data in a single database instance.\n",
275+
"2. Image storage: ApertureDB provides image storage capabilities through its integration with the public cloud providers or on-premise installations. This allows customers to host their own ApertureDB instances and store images on their preferred cloud provider or on-premise infrastructure.\n",
276+
"3. Vector database: ApertureDB also offers a vector database that enables efficient similarity search and classification of images based on their semantic meaning. This can be useful for applications where image search and classification are important, such as in computer vision or machine learning workflows.\n",
277+
"\n",
278+
"Overall, ApertureDB provides flexible and scalable storage options for images, allowing customers to choose the deployment model that best suits their needs.\n"
279+
]
280+
}
281+
],
282+
"source": [
283+
"user_query = \"How can ApertureDB store images?\"\n",
284+
"response = retrieval_chain.invoke({\"input\": user_query})\n",
285+
"print(response[\"answer\"])"
286+
]
287+
}
288+
],
289+
"metadata": {
290+
"kernelspec": {
291+
"display_name": "Python 3 (ipykernel)",
292+
"language": "python",
293+
"name": "python3"
294+
},
295+
"language_info": {
296+
"codemirror_mode": {
297+
"name": "ipython",
298+
"version": 3
299+
},
300+
"file_extension": ".py",
301+
"mimetype": "text/x-python",
302+
"name": "python",
303+
"nbconvert_exporter": "python",
304+
"pygments_lexer": "ipython3",
305+
"version": "3.10.12"
306+
}
307+
},
308+
"nbformat": 4,
309+
"nbformat_minor": 5
310+
}

libs/community/langchain_community/vectorstores/__init__.py

+5
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,9 @@
4343
from langchain_community.vectorstores.apache_doris import (
4444
ApacheDoris,
4545
)
46+
from langchain_community.vectorstores.aperturedb import (
47+
ApertureDB,
48+
)
4649
from langchain_community.vectorstores.astradb import (
4750
AstraDB,
4851
)
@@ -311,6 +314,7 @@
311314
"AnalyticDB",
312315
"Annoy",
313316
"ApacheDoris",
317+
"ApertureDB",
314318
"AstraDB",
315319
"AtlasDB",
316320
"AwaDB",
@@ -413,6 +417,7 @@
413417
"AnalyticDB": "langchain_community.vectorstores.analyticdb",
414418
"Annoy": "langchain_community.vectorstores.annoy",
415419
"ApacheDoris": "langchain_community.vectorstores.apache_doris",
420+
"ApertureDB": "langchain_community.vectorstores.aperturedb",
416421
"AstraDB": "langchain_community.vectorstores.astradb",
417422
"AtlasDB": "langchain_community.vectorstores.atlas",
418423
"AwaDB": "langchain_community.vectorstores.awadb",

0 commit comments

Comments
 (0)