From 62f58504baea950457527f85250d81a33707ae7b Mon Sep 17 00:00:00 2001 From: writinwaters <93570324+writinwaters@users.noreply.github.com> Date: Tue, 30 Apr 2024 19:53:20 +0800 Subject: [PATCH] Updated 0.1.0 benchmark (#1155) ### Type of change - [x] Documentation Update --- docs/references/benchmark.md | 93 +++++++++++++++++++++++++++++++++--- 1 file changed, 87 insertions(+), 6 deletions(-) diff --git a/docs/references/benchmark.md b/docs/references/benchmark.md index 3841bf99ea..c592271372 100644 --- a/docs/references/benchmark.md +++ b/docs/references/benchmark.md @@ -4,16 +4,97 @@ slug: /benchmark --- # Benchmark +1. Install necessary dependencies. + +```python +pip install requirements.txt +``` + +2. Download the required Benchmark datasets to your **/datasets** folder: + - [SIFT1M](http://ann-benchmarks.com/sift-128-euclidean.hdf5) + - [GIST1M](http://ann-benchmarks.com/gist-960-euclidean.hdf5) + - [Dbpedia](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/dbpedia-entity.zip) + - [Enwiki](https://home.apache.org/~mikemccand/enwiki-20120502-lines-1k.txt.lzma) +3. Start up the databases to compare: +```bash +docker compose up -d +``` +4. Run Benchmark: + + > Tasks of this Python script include: + > + > - Delete the original data. + > - Re-insert the data + > - Calculate the time to insert data and build index + > - Calculate QPS. + > - Calculate query latencies. +```bash +python run.py +``` +5. Navigate to the **results** folder to view the results and latency of each query. + +## Benchmark Results +### SIFT1M + +> - Metric: L2 +> - 10000 queries + +| | QPS | Recall | Time to insert & build index | Time to import & build index | Disk | Peak memory | +| ----------------- | ----- | -------------- | ---------------------------- | ---------------------------- | ------ | ----------- | +| **Elasticsearch** | 934 | 0.992 | 131 s | N/A | 874 MB | 1.463 GB | +| **Qdrant** | 1303 | 0.979 | 46 s | N/A | 418 MB | 1.6 GB | +| **Infinity** | 16320 | 0.973 | 74 s | 28 s | 792 MB | 0.95 GB | + + + +### GIST1M + +> - Metric: L2 +> - 1000 queries + +| | QPS | Recall | Time to insert & build index | Time to import & build index | Disk | Peak memory | +| ----------------- | ---- | -------------- | ---------------------------- | ---------------------------- | ------ | ----------- | +| **Elasticsearch** | 305 | 0.885 | 872 s | N/A | 13 GB | 6.9 GB | +| **Qdrant** | 339 | 0.947 | 366 s | N/A | 4.4 GB | 7.3 GB | +| **Infinity** | 2200 | 0.946 | 463 s | 112 s | 4.7 GB | 6.0 GB | + + + +### Dbpedia + +> - 4160000 documents +> - 467 queries + +| | QPS | Time to insert & build index | Time to import & build index | Disk | Peak memory | +| ----------------- | ----------- | ---------------------------- | ---------------------------- | ---- | ------ | +| **Elasticsearch** | 777 | 291 s | N/A | 2 GB | 1.7 GB | +| **Infinity** | 817 | 237 s | 123 s | 3.4 GB | 0.49 GB | + +### Enwiki + +> - 33000000 documents +> - 100 queries + +| | QPS | Time to insert & build index | Time to import & build index | Disk | Peak memory | +| ----------------- | ----------- | ---------------------------- | ---------------------------- | ---- | ----- | +| **Elasticsearch** | 484 | 2289 s | N/A | 28 GB | 5.3 GB | +| **Infinity** | 484 | 2321 s | 944 s | 54 GB | 5.1 GB | + + +--- + +## Deprecated Benchmark + Infinity provides a Python script for benchmarking the SIFT1M and GIST1M datasets. -## Build and start Infinity +### Build and start Infinity You have two options for building Infinity. Choose the option that best fits your needs: - [Build Infinity using Docker](https://github.com/infiniflow/infinity/blob/main/README.md) - [Build from source](../build_from_source.md) -## Download the Benchmark datasets +### Download the Benchmark datasets To obtain the benchmark datasets, you have the option to download them using the wget command. @@ -42,7 +123,7 @@ mv gist/gist_groundtruth.ivecs test/data/benchmark/gist_1m/gist_groundtruth.ivec ``` -## Benchmark dependencies +### Benchmark dependencies ```sh cd python @@ -51,7 +132,7 @@ pip install -r requirements.txt pip install . ``` -## Import the Benchmark datasets +### Import the Benchmark datasets ```sh cd benchmark @@ -64,7 +145,7 @@ python remote_benchmark_knn_import.py -d sift_1m python remote_benchmark_knn_import.py -d gist_1m ``` -## Run Benchmark +### Run Benchmark ```sh # options: @@ -85,7 +166,7 @@ python remote_benchmark_knn.py -t 16 -r 1 -d sift_1m # Perform a latency benchmark on the GIST1M dataset using a single thread, running it only once. python remote_benchmark_knn.py -t 16 -r 1 -d gist_1m ``` -## A SIFT1M Benchmark report +### A SIFT1M Benchmark report - **Hardware**: Intel i5-12500H, 16C, 16GB - **Operating system**: Ubuntu 22.04