sidebar_position | slug |
---|---|
1 |
/benchmark |
This document compares the following key specifications of Elasticsearch, Qdrant, and Infinity:
- QPS
- Recall
- Time to insert & build index
- Time to import & build index
- Disk usage
- Peak memory usage
Version | |
---|---|
Elasticsearch | v8.13.0 |
Qdrant | v1.8.2 |
Infinity | v0.1.0 |
- Install necessary dependencies.
pip install requirements.txt
- Download the required Benchmark datasets to your /datasets folder:
- Start up the databases to compare:
docker compose up -d
-
Run Benchmark:
Tasks of this Python script include:
- Delete the original data.
- Re-insert the data
- Calculate the time to insert data and build index
- Calculate QPS.
- Calculate query latencies.
python run.py
- Navigate to the results folder to view the results and latency of each query.
- Metric: L2
- 10000 queries
QPS | Recall | Time to insert & build index | Time to import & build index | Disk | Peak memory | |
---|---|---|---|---|---|---|
Elasticsearch | 934 | 0.992 | 131 s | N/A | 874 MB | 1.463 GB |
Qdrant | 1303 | 0.979 | 46 s | N/A | 418 MB | 1.6 GB |
Infinity | 16320 | 0.973 | 74 s | 28 s | 792 MB | 0.95 GB |
- Metric: L2
- 1000 queries
QPS | Recall | Time to insert & build index | Time to import & build index | Disk | Peak memory | |
---|---|---|---|---|---|---|
Elasticsearch | 305 | 0.885 | 872 s | N/A | 13 GB | 6.9 GB |
Qdrant | 339 | 0.947 | 366 s | N/A | 4.4 GB | 7.3 GB |
Infinity | 2200 | 0.946 | 463 s | 112 s | 4.7 GB | 6.0 GB |
- 4160000 documents
- 467 queries
QPS | Time to insert & build index | Time to import & build index | Disk | Peak memory | |
---|---|---|---|---|---|
Elasticsearch | 777 | 291 s | N/A | 2 GB | 1.7 GB |
Infinity | 817 | 237 s | 123 s | 3.4 GB | 0.49 GB |
- 33000000 documents
- 100 queries
QPS | Time to insert & build index | Time to import & build index | Disk | Peak memory | |
---|---|---|---|---|---|
Elasticsearch | 484 | 2289 s | N/A | 28 GB | 5.3 GB |
Infinity | 484 | 2321 s | 944 s | 54 GB | 5.1 GB |
Infinity provides a Python script for benchmarking the SIFT1M and GIST1M datasets.
You have two options for building Infinity. Choose the option that best fits your needs:
To obtain the benchmark datasets, you have the option to download them using the wget command.
#download sift benchmark
wget ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz
#download gist benchmark
wget ftp://ftp.irisa.fr/local/texmex/corpus/gist.tar.gz
Alternatively, you can manually download the benchmark datasets by visiting http://corpus-texmex.irisa.fr/.
# Unzip and move the SIFT1M benchmark file.
tar -zxvf sift.tar.gz
mv sift/sift_base.fvecs test/data/benchmark/sift_1m/sift_base.fvecs
mv sift/sift_query.fvecs test/data/benchmark/sift_1m/sift_query.fvecs
mv sift/sift_groundtruth.ivecs test/data/benchmark/sift_1m/sift_groundtruth.ivecs
# Unzip and move the GIST1M benchmark file.
tar -zxvf gist.tar.gz
mv gist/gist_base.fvecs test/data/benchmark/gist_1m/gist_base.fvecs
mv gist/gist_query.fvecs test/data/benchmark/gist_1m/gist_query.fvecs
mv gist/gist_groundtruth.ivecs test/data/benchmark/gist_1m/gist_groundtruth.ivecs
cd python
pip install -r requirements.txt
pip install .
cd benchmark
# options:
# -h, --help show this help message and exit
# -d DATA_SET, --data DATA_SET
python remote_benchmark_knn_import.py -d sift_1m
python remote_benchmark_knn_import.py -d gist_1m
# options:
# -h, --help show this help message and exit
# -t THREADS, --threads THREADS
# -r ROUNDS, --rounds ROUNDS
# -d DATA_SET, --data DATA_SET
# ROUNDS indicates the number of times Python executes the benchmark, and the result represents the average duration for each run.
# Perform a latency benchmark on the SIFT1M dataset using a single thread, running it only once.
python remote_benchmark_knn.py -t 1 -r 1 -d sift_1m
# Perform a latency benchmark on the GIST1M dataset using a single thread, running it only once.
python remote_benchmark_knn.py -t 1 -r 1 -d gist_1m
# Perform a QPS benchmark on the SIFT1M dataset using a single thread, running it only once.
python remote_benchmark_knn.py -t 16 -r 1 -d sift_1m
# Perform a latency benchmark on the GIST1M dataset using a single thread, running it only once.
python remote_benchmark_knn.py -t 16 -r 1 -d gist_1m
- Hardware: Intel i5-12500H, 16C, 16GB
- Operating system: Ubuntu 22.04
- Dataset: SIFT1M; topk: 100; recall: 97%+
- P99 QPS: 15,688 (16 clients)
- P99 Latency: 0.36 ms
- Memory usage: 408 MB