-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small changes to #173 so that it will merge cleanly with main #203
Conversation
…and_timetravel and test_ivf_flat_ingestion_with_additions_and_timetravel
This pull request has been linked to Shortcut Story #38296: Add support for time traveling in C++ IVF Index, without updates. |
@@ -34,6 +34,9 @@ def get_cmake_overrides(): | |||
if val: | |||
conf.append("-DUSE_MKL_CBLAS={}".format(val)) | |||
|
|||
conf.append("-DTileDB_DIR=/Users/lums/Contrib/dist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all local paths /Users/lums
|
||
namespace py = pybind11; | ||
|
||
// See https://chat.openai.com/share/0ec55abe-f5be-4988-a99b-017f27a1e129 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here? Please remove
@@ -399,7 +407,12 @@ def dist_qv_udf( | |||
aqt = [] | |||
for ttt in range(len(active_queries[tt])): | |||
aqt.append(active_queries[tt][ttt]) | |||
#aq.append(np.array(aqt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cleanup all commented out testing code
@@ -495,7 +508,7 @@ def create( | |||
centroids_array_rows_dim, centroids_array_cols_dim | |||
) | |||
centroids_attr = tiledb.Attr( | |||
name="centroids", | |||
name="values", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to lead to a backwards compatibility issue?
fmnistsmall_groundtruth_uri = fmnistsmall_root + "groundtruth" | ||
|
||
''' | ||
m1_root = "/Users/lums/TileDB/TileDB-Vector-Search/external/test_data/arrays/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove all local file paths
@@ -1 +1 @@ | |||
{"cells":[{"cell_type":"code","execution_count":1,"id":"fff9da03-af7c-436e-ac56-6b7a8fe1d453","metadata":{"trusted":true},"outputs":[],"source":["import tiledb\n","from tiledb.cloud import client\n","import tiledb.vector_search as vs\n","from tiledb.vector_search.utils import *\n","\n","import numpy as np\n","import random\n","import sklearn\n","import string"]},{"cell_type":"code","execution_count":2,"id":"b83f2a9e-9af4-46bb-9513-fa5e302a4647","metadata":{"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["--2023-10-06 11:03:33-- https://github.com/TileDB-Inc/TileDB-Vector-Search/releases/download/0.0.1/siftsmall.tgz\n","Resolving github.com (github.com)... 140.82.112.4\n","Connecting to github.com (github.com)|140.82.112.4|:443... connected.\n","HTTP request sent, awaiting response... 302 Found\n","Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/627523373/b1990696-797c-4876-86c9-24cb101f7922?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20231006%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231006T110333Z&X-Amz-Expires=300&X-Amz-Signature=7be26420dc408c0519e72dbc3ced4d62439d4fbefd61b40ccba35a28cb3422fa&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=627523373&response-content-disposition=attachment%3B%20filename%3Dsiftsmall.tgz&response-content-type=application%2Foctet-stream [following]\n","--2023-10-06 11:03:33-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/627523373/b1990696-797c-4876-86c9-24cb101f7922?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20231006%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231006T110333Z&X-Amz-Expires=300&X-Amz-Signature=7be26420dc408c0519e72dbc3ced4d62439d4fbefd61b40ccba35a28cb3422fa&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=627523373&response-content-disposition=attachment%3B%20filename%3Dsiftsmall.tgz&response-content-type=application%2Foctet-stream\n","Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...\n","Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.109.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 5313773 (5.1M) [application/octet-stream]\n","Saving to: ‘siftsmall.tgz.10’\n","\n","siftsmall.tgz.10 100%[===================>] 5.07M --.-KB/s in 0.02s \n","\n","2023-10-06 11:03:33 (318 MB/s) - ‘siftsmall.tgz.10’ saved [5313773/5313773]\n","\n"]}],"source":["!cd /tmp && wget https://github.com/TileDB-Inc/TileDB-Vector-Search/releases/download/0.0.1/siftsmall.tgz\n","!cd /tmp && tar xf siftsmall.tgz"]},{"cell_type":"code","execution_count":3,"id":"d8e5151b-9672-4001-97eb-d56c76c1b8be","metadata":{"trusted":true},"outputs":[],"source":["def delete_if_exists(uri):\n"," try:\n"," group = tiledb.Group(uri, \"m\")\n"," except tiledb.TileDBError as err:\n"," message = str(err)\n"," if \"group does not exist\" in message:\n"," return\n"," else:\n"," raise err\n"," group.delete()\n"]},{"cell_type":"code","execution_count":4,"id":"b65fbb15-ec7a-4868-95f3-2a7da577b9cf","metadata":{"trusted":true},"outputs":[],"source":["namespace=client.default_user().username\n","random_suffix = \"\".join(random.choices(string.ascii_letters, k=10))\n","\n","# Use this in staging notebook\n","# index_uri = f\"tiledb://TileDB-Inc/s3://tiledb-unittest/groups/unit-tests/vector_search/{namespace}/sift10k_flat\"\n","# ivf_index_uri = f\"tiledb://TileDB-Inc/s3://tiledb-unittest/groups/unit-tests/vector_search/{namespace}/sift10k_ivf_flat\"\n","\n","# Use this for local tests\n","index_uri = f\"tiledb://{namespace}/s3://tiledb-unittest/groups/unit-tests/vector_search/{namespace}/sift10k_flat_{random_suffix}\"\n","ivf_index_uri = f\"tiledb://{namespace}/s3://tiledb-unittest/groups/unit-tests/vector_search/{namespace}/sift10k_ivf_flat_{random_suffix}\"\n","\n","source_uri = \"/tmp/siftsmall_base.fvecs\"\n","\n","delete_if_exists(index_uri)\n","delete_if_exists(ivf_index_uri)"]},{"cell_type":"code","execution_count":5,"id":"42ab8a5d-7888-47ef-b663-9fe84780332e","metadata":{"trusted":true},"outputs":[],"source":["flat_index = vs.ingest(\n"," index_type = \"FLAT\",\n"," index_uri = index_uri,\n"," source_uri = source_uri,\n",")"]},{"cell_type":"code","execution_count":6,"id":"5cdd3663-4b16-4731-a61d-9fd6be49c418","metadata":{"trusted":true},"outputs":[],"source":["ivf_flat_index = vs.ingest(\n"," index_type=\"IVF_FLAT\",\n"," source_uri=source_uri,\n"," index_uri=ivf_index_uri,\n",")"]},{"cell_type":"code","execution_count":7,"id":"87c535e9-d10b-436e-a098-4a376b74711f","metadata":{"trusted":true},"outputs":[],"source":["# Get query vectors with ground truth\n","query_vectors = load_fvecs(\"/tmp/siftsmall_query.fvecs\")\n","ground_truth = load_ivecs(\"/tmp/siftsmall_groundtruth.ivecs\")"]},{"cell_type":"code","execution_count":8,"id":"0efa4265-aabf-4f5f-8ea2-1c7081d70e47","metadata":{"trusted":true},"outputs":[],"source":["def accuracy(result, gt):\n"," found = 0\n"," total = 0\n"," i = 0\n"," for r in result:\n"," total += len(r)\n"," found += len(np.intersect1d(r, gt[i]))\n"," i += 1\n"," return found / total"]},{"cell_type":"code","execution_count":9,"id":"cdc67d8d-cea3-4017-b42b-26952a2a84a6","metadata":{"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Accuracy: 1.0\n"]}],"source":["# Return the 100 most similar vectors to the query vectors with FLAT\n","result_d, result_i = flat_index.query(query_vectors, k=100)\n","ac = accuracy(result_i, ground_truth)\n","print(f\"Accuracy: {ac}\")\n","assert ac == 1.0\n","\n"]},{"cell_type":"code","execution_count":10,"id":"5217e86d-a30d-4a16-b39b-c31d123786be","metadata":{"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Accuracy: 0.9204\n"]}],"source":["# Return the 100 most similar vectors to the query vectors with IVF_FLAT\n","# (you can set the nprobe parameter)\n","result_ivf_d, result_ivf_i = ivf_flat_index.query(query_vectors, nprobe=10, k=100)\n","ac = accuracy(result_ivf_i, ground_truth)\n","print(f\"Accuracy: {ac}\")\n","assert ac >= 0.85"]},{"cell_type":"code","execution_count":null,"id":"7fee6898-9369-43dd-92fa-067bac9df452","metadata":{"trusted":true},"outputs":[],"source":["# Test distributed query\n","result_ivf_d, result_ivf_i = ivf_flat_index.query(query_vectors, nprobe=10, k=100, mode=tiledb.cloud.dag.Mode.BATCH, num_partitions=2)\n","ac = accuracy(result_ivf_i, ground_truth)\n","print(f\"Accuracy: {ac}\")\n","assert ac >= 0.85"]},{"cell_type":"code","execution_count":null,"id":"662974e8","metadata":{},"outputs":[],"source":["delete_if_exists(index_uri)\n","delete_if_exists(ivf_index_uri)"]}],"metadata":{"kernelspec":{"display_name":"Python 3 (ipykernel)","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.9.18"}},"nbformat":4,"nbformat_minor":5} | |||
{"cells":[{"cell_type":"code","execution_count":1,"id":"fff9da03-af7c-436e-ac56-6b7a8fe1d453","metadata":{"trusted":true},"outputs":[],"source":["import tiledb\n","from tiledb.cloud import client\n","import tiledb.vector_search as vs\n","from tiledb.vector_search.utils import *\n","\n","import numpy as np\n","import random\n","import sklearn\n","import string"]},{"cell_type":"code","execution_count":2,"id":"b83f2a9e-9af4-46bb-9513-fa5e302a4647","metadata":{"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["--2023-10-06 11:03:33-- https://github.com/TileDB-Inc/TileDB-Vector-Search/releases/download/0.0.1/siftsmall.tgz\n","Resolving github.com (github.com)... 140.82.112.4\n","Connecting to github.com (github.com)|140.82.112.4|:443... connected.\n","HTTP request sent, awaiting response... 302 Found\n","Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/627523373/b1990696-797c-4876-86c9-24cb101f7922?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20231006%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231006T110333Z&X-Amz-Expires=300&X-Amz-Signature=7be26420dc408c0519e72dbc3ced4d62439d4fbefd61b40ccba35a28cb3422fa&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=627523373&response-content-disposition=attachment%3B%20filename%3Dsiftsmall.tgz&response-content-type=application%2Foctet-stream [following]\n","--2023-10-06 11:03:33-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/627523373/b1990696-797c-4876-86c9-24cb101f7922?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20231006%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231006T110333Z&X-Amz-Expires=300&X-Amz-Signature=7be26420dc408c0519e72dbc3ced4d62439d4fbefd61b40ccba35a28cb3422fa&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=627523373&response-content-disposition=attachment%3B%20filename%3Dsiftsmall.tgz&response-content-type=application%2Foctet-stream\n","Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...\n","Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.109.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 5313773 (5.1M) [application/octet-stream]\n","Saving to: ‘siftsmall.tgz.10’\n","\n","siftsmall.tgz.10 100%[===================>] 5.07M --.-KB/s in 0.02s \n","\n","2023-10-06 11:03:33 (318 MB/s) - ‘siftsmall.tgz.10’ saved [5313773/5313773]\n","\n"]}],"source":["!cd /tmp && wget https://github.com/TileDB-Inc/TileDB-Vector-Search/releases/download/0.0.1/siftsmall.tgz\n","!cd /tmp && tar xf siftsmall.tgz"]},{"cell_type":"code","execution_count":3,"id":"d8e5151b-9672-4001-97eb-d56c76c1b8be","metadata":{"trusted":true},"outputs":[],"source":["def delete_if_exists(uri):\n"," try:\n"," group = tiledb.Group(uri, \"m\")\n"," except tiledb.TileDBError as err:\n"," message = str(err)\n"," if \"group does not exist\" in message:\n"," return\n"," else:\n"," raise err\n"," group.delete()\n"]},{"cell_type":"code","execution_count":4,"id":"b65fbb15-ec7a-4868-95f3-2a7da577b9cf","metadata":{"trusted":true},"outputs":[],"source":["namespace=client.default_user().username\n","random_suffix = \"\".join(random.choices(string.ascii_letters, k=10))\n","\n","# Use this in staging notebook\n","# sift_index_uri = f\"tiledb://TileDB-Inc/s3://tiledb-unittest/groups/unit-tests/vector_search/{namespace}/sift10k_flat\"\n","# ivf_index_uri = f\"tiledb://TileDB-Inc/s3://tiledb-unittest/groups/unit-tests/vector_search/{namespace}/sift10k_ivf_flat\"\n","\n","# Use this for local tests\n","sift_index_uri = f\"tiledb://{namespace}/s3://tiledb-unittest/groups/unit-tests/vector_search/{namespace}/sift10k_flat_{random_suffix}\"\n","ivf_index_uri = f\"tiledb://{namespace}/s3://tiledb-unittest/groups/unit-tests/vector_search/{namespace}/sift10k_ivf_flat_{random_suffix}\"\n","\n","source_uri = \"/tmp/siftsmall_base.fvecs\"\n","\n","delete_if_exists(sift_index_uri)\n","delete_if_exists(ivf_index_uri)"]},{"cell_type":"code","execution_count":5,"id":"42ab8a5d-7888-47ef-b663-9fe84780332e","metadata":{"trusted":true},"outputs":[],"source":["flat_index = vs.ingest(\n"," index_type = \"FLAT\",\n"," sift_index_uri = sift_index_uri,\n"," source_uri = source_uri,\n",")"]},{"cell_type":"code","execution_count":6,"id":"5cdd3663-4b16-4731-a61d-9fd6be49c418","metadata":{"trusted":true},"outputs":[],"source":["ivf_flat_index = vs.ingest(\n"," index_type=\"IVF_FLAT\",\n"," source_uri=source_uri,\n"," sift_index_uri=ivf_index_uri,\n",")"]},{"cell_type":"code","execution_count":7,"id":"87c535e9-d10b-436e-a098-4a376b74711f","metadata":{"trusted":true},"outputs":[],"source":["# Get query vectors with ground truth\n","query_vectors = load_fvecs(\"/tmp/siftsmall_query.fvecs\")\n","ground_truth = load_ivecs(\"/tmp/siftsmall_groundtruth.ivecs\")"]},{"cell_type":"code","execution_count":8,"id":"0efa4265-aabf-4f5f-8ea2-1c7081d70e47","metadata":{"trusted":true},"outputs":[],"source":["def accuracy(result, gt):\n"," found = 0\n"," total = 0\n"," i = 0\n"," for r in result:\n"," total += len(r)\n"," found += len(np.intersect1d(r, gt[i]))\n"," i += 1\n"," return found / total"]},{"cell_type":"code","execution_count":9,"id":"cdc67d8d-cea3-4017-b42b-26952a2a84a6","metadata":{"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Accuracy: 1.0\n"]}],"source":["# Return the 100 most similar vectors to the query vectors with FLAT\n","result_d, result_i = flat_index.query(query_vectors, k=100)\n","ac = accuracy(result_i, ground_truth)\n","print(f\"Accuracy: {ac}\")\n","assert ac == 1.0\n","\n"]},{"cell_type":"code","execution_count":10,"id":"5217e86d-a30d-4a16-b39b-c31d123786be","metadata":{"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Accuracy: 0.9204\n"]}],"source":["# Return the 100 most similar vectors to the query vectors with IVF_FLAT\n","# (you can set the nprobe parameter)\n","result_ivf_d, result_ivf_i = ivf_flat_index.query(query_vectors, nprobe=10, k=100)\n","ac = accuracy(result_ivf_i, ground_truth)\n","print(f\"Accuracy: {ac}\")\n","assert ac >= 0.85"]},{"cell_type":"code","execution_count":null,"id":"7fee6898-9369-43dd-92fa-067bac9df452","metadata":{"trusted":true},"outputs":[],"source":["# Test distributed query\n","result_ivf_d, result_ivf_i = ivf_flat_index.query(query_vectors, nprobe=10, k=100, mode=tiledb.cloud.dag.Mode.BATCH, num_partitions=2)\n","ac = accuracy(result_ivf_i, ground_truth)\n","print(f\"Accuracy: {ac}\")\n","assert ac >= 0.85"]},{"cell_type":"code","execution_count":null,"id":"662974e8","metadata":{},"outputs":[],"source":["delete_if_exists(sift_index_uri)\n","delete_if_exists(ivf_index_uri)"]}],"metadata":{"kernelspec":{"display_name":"Python 3 (ipykernel)","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.9.18"}},"nbformat":4,"nbformat_minor":5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert all changes to the test notebooks or explain why they are needed. They shouldn't be any changes to our Python API from this PR
@@ -19,6 +20,271 @@ def query_and_check_equals(index, queries, expected_result_d, expected_result_i) | |||
check_equals(result_d=result_d, result_i=result_i, expected_result_d=expected_result_d, expected_result_i=expected_result_i) | |||
|
|||
|
|||
|
|||
def test_ivf_flat_ingestion_with_updates_and_timetravel(tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this back to its previous place in the file so that the diff can be smaller and ensure that there is no change in the test code
This reverts commit 0a17f1b.
TEST_CASE("scoring: vector test", "[scoring]") { | ||
auto a = std::vector<float>(10); | ||
auto b = std::vector<float>(10); | ||
auto c = std::vector<float>(10); | ||
auto d = std::vector<float>(10); | ||
auto z = std::vector<float>(10); | ||
|
||
std::iota(begin(a), end(a), 5); | ||
std::iota(begin(b), end(b), 5); | ||
std::iota(begin(c), end(c), 4); | ||
std::iota(begin(d), end(d), 6); | ||
std::fill(begin(z), end(z), 0); | ||
|
||
REQUIRE(!std::equal(begin(a), end(a), begin(z))); | ||
REQUIRE(!std::equal(begin(b), end(b), begin(z))); | ||
REQUIRE(!std::equal(begin(c), end(c), begin(z))); | ||
REQUIRE(!std::equal(begin(d), end(d), begin(z))); | ||
|
||
SECTION("sum_of_squares function") { | ||
CHECK(sum_of_squares(a, b) == 0); | ||
CHECK(sum_of_squares(a, c) == 10); | ||
CHECK(sum_of_squares(c, a) == 10); | ||
CHECK(sum_of_squares(a, d) == 10); | ||
CHECK(sum_of_squares(d, a) == 10); | ||
CHECK(sum_of_squares(a, z) == 985); | ||
} | ||
|
||
SECTION("sum_of_squares_distance function object") { | ||
CHECK(sum_of_squares_distance{}(a, b) == 0); | ||
CHECK(sum_of_squares_distance{}(a, c) == 10); | ||
CHECK(sum_of_squares_distance{}(c, a) == 10); | ||
CHECK(sum_of_squares_distance{}(a, d) == 10); | ||
CHECK(sum_of_squares_distance{}(d, a) == 10); | ||
CHECK(sum_of_squares_distance{}(a, z) == 985); | ||
} | ||
SECTION("sub_sum_of_squares function") { | ||
CHECK(sub_sum_of_squares(a, b, 0, size(a)) == 0); | ||
CHECK(sub_sum_of_squares(a, c, 0, size(a)) == 10); | ||
CHECK(sub_sum_of_squares(c, a, 0, size(a)) == 10); | ||
CHECK(sub_sum_of_squares(a, d, 0, size(a)) == 10); | ||
CHECK(sub_sum_of_squares(d, a, 0, size(a)) == 10); | ||
CHECK(sub_sum_of_squares(a, z, 0, size(a)) == 985); | ||
|
||
CHECK(sub_sum_of_squares(a, b, 0, size(a) / 2) == 0); | ||
CHECK(sub_sum_of_squares(a, c, 0, size(a) / 2) == 5); | ||
CHECK(sub_sum_of_squares(c, a, 0, size(a) / 2) == 5); | ||
CHECK(sub_sum_of_squares(a, d, 0, size(a) / 2) == 5); | ||
CHECK(sub_sum_of_squares(d, a, 0, size(a) / 2) == 5); | ||
CHECK(sub_sum_of_squares(a, z, 0, size(a) / 2) == 255); | ||
|
||
CHECK(sub_sum_of_squares(a, b, size(a) / 2, size(a)) == 0); | ||
CHECK(sub_sum_of_squares(a, c, size(a) / 2, size(a)) == 5); | ||
CHECK(sub_sum_of_squares(c, a, size(a) / 2, size(a)) == 5); | ||
CHECK(sub_sum_of_squares(a, d, size(a) / 2, size(a)) == 5); | ||
CHECK(sub_sum_of_squares(d, a, size(a) / 2, size(a)) == 5); | ||
CHECK(sub_sum_of_squares(a, z, size(a) / 2, size(a)) == 730); | ||
} | ||
|
||
SECTION("sub_sum_of_squares_distance{} function object") { | ||
CHECK(sub_sum_of_squares_distance{0, size(a)}(a, b) == 0); | ||
CHECK(sub_sum_of_squares_distance{0, size(a)}(a, c) == 10); | ||
CHECK(sub_sum_of_squares_distance{0, size(a)}(c, a) == 10); | ||
CHECK(sub_sum_of_squares_distance{0, size(a)}(a, d) == 10); | ||
CHECK(sub_sum_of_squares_distance{0, size(a)}(d, a) == 10); | ||
CHECK(sub_sum_of_squares_distance{0, size(a)}(a, z) == 985); | ||
|
||
CHECK(sub_sum_of_squares_distance{0, size(a) / 2}(a, b) == 0); | ||
CHECK(sub_sum_of_squares_distance{0, size(a) / 2}(a, c) == 5); | ||
CHECK(sub_sum_of_squares_distance{0, size(a) / 2}(c, a) == 5); | ||
CHECK(sub_sum_of_squares_distance{0, size(a) / 2}(a, d) == 5); | ||
CHECK(sub_sum_of_squares_distance{0, size(a) / 2}(d, a) == 5); | ||
CHECK(sub_sum_of_squares_distance{0, size(a) / 2}(a, z) == 255); | ||
|
||
CHECK(sub_sum_of_squares_distance{size(a) / 2, size(a)}(a, b) == 0); | ||
CHECK(sub_sum_of_squares_distance{size(a) / 2, size(a)}(a, c) == 5); | ||
CHECK(sub_sum_of_squares_distance{size(a) / 2, size(a)}(c, a) == 5); | ||
CHECK(sub_sum_of_squares_distance{size(a) / 2, size(a)}(a, d) == 5); | ||
CHECK(sub_sum_of_squares_distance{size(a) / 2, size(a)}(d, a) == 5); | ||
CHECK(sub_sum_of_squares_distance{size(a) / 2, size(a)}(a, z) == 730); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lums658 do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. These function objects are what allow PQ to work with current query functions.
for (unsigned iter = 0; iter < trials; ++iter) { | ||
for (unsigned i = 0; i < size_v; ++i) { | ||
heap.insert({v[i], i}); | ||
} | ||
} | ||
} | ||
|
||
template <class Heap> | ||
void do_time_pair_heap( | ||
const std::string& msg, | ||
Heap& heap, | ||
const std::vector<size_t>& v, | ||
size_t trials = 1) { | ||
scoped_timer _{msg, true}; | ||
|
||
auto size_v = v.size(); | ||
|
||
for (unsigned iter = 0; iter < trials; ++iter) { | ||
for (unsigned i = 0; i < size_v; ++i) { | ||
heap.insert(v[i], i); | ||
} | ||
} | ||
} | ||
|
||
template <class T, class U, bool use_push = false, bool use_pop = false> | ||
class simple_pair_heap : public std::vector<std::tuple<T, U>> { | ||
using Base = std::vector<std::tuple<T, U>>; | ||
// using Base::Base; | ||
unsigned max_size{0}; | ||
|
||
public: | ||
explicit simple_pair_heap(std::integral auto k) | ||
: Base(0) | ||
, max_size{(unsigned)k} { | ||
Base::reserve(k); | ||
} | ||
|
||
explicit simple_pair_heap( | ||
unsigned k, std::initializer_list<std::tuple<T, U>> l) | ||
: Base(0) | ||
, max_size{k} { | ||
Base::reserve(k); | ||
for (auto& p : l) { | ||
insert(std::get<0>(p), std::get<1>(p)); | ||
} | ||
} | ||
|
||
bool insert(const T& x, const U& y) { | ||
if (Base::size() < max_size) { | ||
Base::emplace_back(x, y); | ||
if constexpr (use_push) { | ||
std::push_heap(begin(*this), end(*this), [&](auto& a, auto& b) { | ||
return std::get<0>(a) < std::get<0>(b); | ||
}); | ||
} else { | ||
if (Base::size() == max_size) { | ||
std::make_heap(begin(*this), end(*this), [&](auto& a, auto& b) { | ||
return std::get<0>(a) < std::get<0>(b); | ||
}); | ||
} | ||
} | ||
return true; | ||
} else if (x < std::get<0>(this->front())) { | ||
std::pop_heap(begin(*this), end(*this), [&](auto& a, auto& b) { | ||
return std::get<0>(a) < std::get<0>(b); | ||
}); | ||
|
||
if constexpr (use_pop) { | ||
this->pop_back(); | ||
this->emplace_back(x, y); | ||
} else { | ||
// std::get<0>(this->back()) = x; | ||
// std::get<1>(this->back()) = y; | ||
(*this)[max_size - 1] = std::make_tuple(x, y); | ||
// std::get<0>((*this)[max_size - 1]) = x; | ||
// std::get<1>((*this)[max_size - 1]) = y; | ||
} | ||
|
||
std::push_heap(begin(*this), end(*this), [&](auto& a, auto& b) { | ||
return std::get<0>(a) < std::get<0>(b); | ||
}); | ||
return true; | ||
} | ||
return false; | ||
} | ||
}; | ||
|
||
int main() { | ||
// Use a random device as the seed for the random number generator | ||
std::random_device rd; | ||
std::mt19937 rng(rd()); | ||
|
||
for (size_t n : {10, 1000, 100'000, 100'000'000}) { | ||
std::vector<float> scores(n); | ||
std::vector<size_t> v(n); | ||
|
||
std::iota(begin(v), end(v), 17); | ||
std::iota(begin(scores), end(scores), 17); | ||
|
||
std::shuffle(begin(v), end(v), rng); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The fixed min heap is a performance bottleneck in most of our code. We need to be able to time different approaches to make sure our implementation isn't regressing -- and to also compare implementation alternatives in terms of performance.
/** | ||
* @file unit_timetravel.cc | ||
* | ||
* @section LICENSE | ||
* | ||
* The MIT License | ||
* | ||
* @copyright Copyright (c) 2023 TileDB, Inc. | ||
* | ||
* Permission is hereby granted, free of charge, to any person obtaining a copy | ||
* of this software and associated documentation files (the "Software"), to deal | ||
* in the Software without restriction, including without limitation the rights | ||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
* copies of the Software, and to permit persons to whom the Software is | ||
* furnished to do so, subject to the following conditions: | ||
* | ||
* The above copyright notice and this permission notice shall be included in | ||
* all copies or substantial portions of the Software. | ||
* | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
* THE SOFTWARE. | ||
* | ||
* @section DESCRIPTION | ||
* | ||
* Test time traveling support in TileDB-Vector-Search | ||
* | ||
*/ | ||
|
||
#include <algorithm> | ||
#include <catch2/catch_all.hpp> | ||
#include <chrono> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, no. For TileDB-Vector-Search, absolutely. And it's well overdue. We have been incorporating time-travel bits into the C++ library, but have not unit tests for it -- partly because there isn't any spec or other explanation of correct behavior and thus no way to write actual unit tests.
But if unit tests ever get written for time traveling -- and I hope they will -- this is the file where they should go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is used for instrumenting and analyzing the graphs that vamana builds and is crucial for debugging. Some things just aren't amenable to automated unit testing. (Though there is more we could be testing about the built graphs and the building process.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@lums658 I merged |
Closing for now, we can come back and reference this functionality as we need to. |
The main changes to this branch as compared to main are given in the PR messages for #159 and #173 (and other relevant ancestors).
Relative to #173 itself:
test_ivf_flat_ingestion_with_updates_and_timetravel
andtest_ivf_flat_ingestion_with_additions_and_timetravel