Skip to content

Commit

Permalink
Improve ObjectAPI documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
NikolaosPapailiou committed Oct 1, 2024
1 parent 569ab1a commit cce7ec1
Show file tree
Hide file tree
Showing 5 changed files with 42 additions and 46 deletions.
8 changes: 6 additions & 2 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,21 @@ quartodoc:
- title: "Vector API"
desc: ""
contents:
- open
- ingestion
- index.Index
- subtitle: "Algorithms"
desc: ""
contents:
- flat_index
- ivf_flat_index
- vamana_index
- ivf_pq_index
- ingestion
- title: "Object API"
desc: ""
contents:
- object_api.create
- object_api.ObjectIndex
- object_api.ingest_embeddings_with_driver
- embeddings.ObjectEmbedding
- object_readers.ObjectReader
- object_readers.ObjectPartition
Expand Down
45 changes: 21 additions & 24 deletions apis/python/src/tiledb/vector_search/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,14 @@
class Index(metaclass=ABCMeta):
"""
Abstract Vector Index class.
Do not use this directly but rather use the `open` factory method.
All Vector Index algorithm implementations are instantiations of this class. Apart
from the abstract method interfaces, `Index` provides implementations for common
tasks i.e. supporting updates, time-traveling and metadata management.
Opens an `Index` reading metadata and applying time-traveling options.
Do not use this directly but rather instantiate the concrete Index classes.
Parameters
----------
uri: str
Expand Down Expand Up @@ -883,35 +882,33 @@ def create_metadata(
group.close()


"""
Factory method that opens a vector index.
Retrieves the `index_type` from the index group metadata and instantiates the appropriate `Index` subclass.
Parameters
----------
uri: str
URI of the index.
config: Optional[Mapping[str, Any]]
TileDB config dictionary.
timestamp: int or tuple(int)
If int, open the index at a given timestamp.
If tuple, open at the given start and end timestamps.
open_for_remote_query_execution: bool
If `True`, do not load any index data in main memory locally, and instead load index data in the TileDB Cloud taskgraph created when a non-`None` `driver_mode` is passed to `query()`.
If `False`, load index data in main memory locally. Note that you can still use a taskgraph for query execution, you'll just end up loading the data both on your local machine and in the cloud taskgraph.
kwargs:
Additional arguments to be passed to the `Index` subclass constructor.
"""


def open(
uri: str,
open_for_remote_query_execution: bool = False,
config: Optional[Mapping[str, Any]] = None,
timestamp=None,
**kwargs,
) -> Index:
"""
Factory method that opens a vector index.
Retrieves the `index_type` from the index group metadata and instantiates the appropriate `Index` subclass.
Parameters
----------
uri: str
URI of the index.
config: Optional[Mapping[str, Any]]
TileDB config dictionary.
timestamp: int or tuple(int)
If int, open the index at a given timestamp.
If tuple, open at the given start and end timestamps.
open_for_remote_query_execution: bool
If `True`, do not load any index data in main memory locally, and instead load index data in the TileDB Cloud taskgraph created when a non-`None` `driver_mode` is passed to `query()`.
If `False`, load index data in main memory locally. Note that you can still use a taskgraph for query execution, you'll just end up loading the data both on your local machine and in the cloud taskgraph.
kwargs:
Additional arguments to be passed to the `Index` subclass constructor.
"""
from tiledb.vector_search.flat_index import FlatIndex
from tiledb.vector_search.ivf_flat_index import IVFFlatIndex
from tiledb.vector_search.ivf_pq_index import IVFPQIndex
Expand Down
3 changes: 2 additions & 1 deletion apis/python/src/tiledb/vector_search/object_api/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from .embeddings_ingestion import ingest_embeddings_with_driver
from .object_index import ObjectIndex
from .object_index import create

__all__ = ["ObjectIndex", "ingest_embeddings_with_driver"]
__all__ = ["ObjectIndex", "create", "ingest_embeddings_with_driver"]
Original file line number Diff line number Diff line change
Expand Up @@ -36,19 +36,16 @@ def ingest_embeddings_with_driver(
a TileDB Cloud DAG (Directed Acyclic Graph). The DAG consists of two main stages:
1. **Embeddings Generation:** This stage is responsible for computing embeddings
for the objects to be indexed. It can be executed in one of three modes:
- **LOCAL:** Embeddings are generated locally within the current process.
- **REALTIME:** Embeddings are generated using a TileDB Cloud REALTIME TaskGraph.
- **BATCH:** Embeddings are generated using a TileDB Cloud BATCH TaskGraph.
for the objects to be indexed.
2. **Vector Indexing:** This stage is responsible for ingesting the generated
embeddings into the TileDB vector search index. It can be executed in one of
three modes:
embeddings into the TileDB vector search index.
Both stages can be be executed in one of three modes:
- **LOCAL:** Embeddings are ingested locally within the current process.
- **REALTIME:** Embeddings are ingested using a TileDB Cloud REALTIME TaskGraph.
- **BATCH:** Embeddings are ingested using a TileDB Cloud BATCH TaskGraph.
- **LOCAL:** Embeddings are ingested locally within the current process.
- **REALTIME:** Embeddings are ingested using a TileDB Cloud REALTIME TaskGraph.
- **BATCH:** Embeddings are ingested using a TileDB Cloud BATCH TaskGraph.
The `ingest_embeddings_with_driver` function provides flexibility in configuring
the execution environment for both stages. Users can specify the number of workers,
Expand Down
15 changes: 6 additions & 9 deletions apis/python/src/tiledb/vector_search/object_api/object_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@ def update_object_reader(
group.meta["object_reader_kwargs"] = self.object_reader_kwargs
group.close()

def create_embeddings_partitioned_array(self) -> (str, str):
def _create_embeddings_partitioned_array(self) -> (str, str):
"""
Creates a partitioned array for storing embeddings.
Expand Down Expand Up @@ -477,15 +477,12 @@ def update_index(
a TileDB Cloud DAG (Directed Acyclic Graph). The DAG consists of two main stages:
1. **Embeddings Generation:** This stage is responsible for computing embeddings
for the objects to be indexed. It can be executed in one of three modes:
- **LOCAL:** Embeddings are generated locally within the current process.
- **REALTIME:** Embeddings are generated using a TileDB Cloud REALTIME TaskGraph.
- **BATCH:** Embeddings are generated using a TileDB Cloud BATCH TaskGraph.
for the objects to be indexed.
2. **Vector Indexing:** This stage is responsible for ingesting the generated
embeddings into the TileDB vector search index. It can be executed in one of
three modes:
embeddings into the TileDB vector search index.
Both stages can be be executed in one of three modes:
- **LOCAL:** Embeddings are ingested locally within the current process.
- **REALTIME:** Embeddings are ingested using a TileDB Cloud REALTIME TaskGraph.
Expand Down Expand Up @@ -549,7 +546,7 @@ def update_index(
(
temp_dir_name,
embeddings_array_uri,
) = self.create_embeddings_partitioned_array()
) = self._create_embeddings_partitioned_array()
use_updates_array = False

storage_formats[self.index.storage_version]["EXTERNAL_IDS_ARRAY_NAME"]
Expand Down

0 comments on commit cce7ec1

Please sign in to comment.