Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IVF_PQ re-ranking #502

Merged
merged 32 commits into from
Sep 6, 2024
Merged

IVF_PQ re-ranking #502

merged 32 commits into from
Sep 6, 2024

Conversation

jparismorgan
Copy link
Contributor

@jparismorgan jparismorgan commented Aug 22, 2024

What

Adds support for re-ranking to IVF_PQ. This is controlled by k_factor. We first query for k_factor * k vector IDs and indices, then fetch the original (not PQ-encoded) vectors and re-rank those.

Testing

  • Adds several new tests

Benchmarks

Here are the results with local-benchmarks SIFT 1 million. Note that this is with local benchmarks so doesn't include network latency. But for the everything is local case we get a nice accuracy improvement with a fairly small speed penality:

IVF_PQ_query_time_vs_accuracy

@jparismorgan jparismorgan marked this pull request as ready for review September 5, 2024 10:14
Copy link
Collaborator

@NikolaosPapailiou NikolaosPapailiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on implementing this!

Can we also get the benchmark results for SIFT-1M or SIFT-10M and S3 storage? This will give us a good idea of whether we need more code optimizations on the reranking part of the code.

apis/python/test/conftest.py Outdated Show resolved Hide resolved
apis/python/test/test_index.py Show resolved Hide resolved
apis/python/test/test_ingestion.py Outdated Show resolved Hide resolved
src/include/index/ivf_pq_index.h Outdated Show resolved Hide resolved
std::move(initial_distances), std::move(initial_ids));
}

if (::num_vectors(feature_vectors_) == 0 && !group_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also load all feature_vectors_ for the infinite memory case during the constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep so that will happen (we hit that code in the second call to rerank_query() in rerank().

But this was just to cover the case where you create an empty index and then try to query with re-ranking. But I realized we actually don't need to throw here, so updated by moving the check for group_ below and only reading from the URI if we have the group. Also added a test for this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I still don't understand where we load the full feature_vectors_ array. Shouldn't this be inside read_index_infinite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, oops, I thought you meant loading feature_vectors_ during add(). You are correct, we do not load the full feature_vectors_ anywhere except during add().

In the constructor we don't know if we're opening for infinite or finite queries, that is per-query. We could perhaps load the full feature_vectors_ during the call to query_infinite() (which calls read_index_infinite()), but then we will perhaps be loading way more vectors than we actually need to, and than we would if we just loaded them in rerank(). The benefit though would be we don't need to do any load during the second call to query_infinite(), whereas today we do.

Copy link
Collaborator

@NikolaosPapailiou NikolaosPapailiou Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple options here, so maybe add some query/constructor parameters to make sure that we can leverage all of them and understand what happens in each case. For example, for small arrays and infinite memory, it would make sense to load the full feature_vectors_ array once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline - will do this along with moving memory_budget into the index constructor in a follow-up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO or shortcut for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started on it here! #518

@jparismorgan
Copy link
Contributor Author

jparismorgan commented Sep 5, 2024

Can we also get the benchmark results for SIFT-1M or SIFT-10M and S3 storage? This will give us a good idea of whether we need more code optimizations on the reranking part of the code.

Yep, will do, just opened up #514 which will make this comparison easy. Would prefer to not block this PR on that though, if it's okay with you? The k_factor = 1.f case is unaffected by this, so seems fine to get it merged even if it does turn out to be slow for remote URIs.

@NikolaosPapailiou
Copy link
Collaborator

Can we also get the benchmark results for SIFT-1M or SIFT-10M and S3 storage? This will give us a good idea of whether we need more code optimizations on the reranking part of the code.

Yep, will do, just opened up #514 which will make this comparison easy. Would prefer to not block this PR on that though, if it's okay with you? The k_factor = 1.f case is unaffected by this, so seems fine to get it merged even if it does turn out to be slow for remote URIs.

As this is new functionality mostly relevant for larger datasets stored in object stores, I would prefer to see the performance of it to understand the current state. And since #514 is ready I think we should try to produce it.

@jparismorgan
Copy link
Contributor Author

Here are the results of local-benchmarks.py on SIFT 1M using k=100. It is run from my local laptop with an index_uri that is on AWS.

query_infinite_ram()
query_time_vs_accuracy

query_finite_ram() with memory_budget=1000000 for both IVF_FLAT and IVF_PQ:
query_time_vs_accuracy

Observations:

  • IVF_PQ re-ranking seems to be a good improvement versus no re-ranking. With it, IVF_FLAT is still better, but IVF_PQ is now reasonably close.
  • IVF_PQ infinite queries seem to do better than finite queries.

Copy link
Collaborator

@NikolaosPapailiou NikolaosPapailiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to perform more benchmarks/optimizations to make sure that IVF_PQ is materially better than IVF_FLAT for large objectstore-based datasets and OOC query execution. However, I am ok to merge this as a first version of reranking and follow up in a more specific PR.

@jparismorgan jparismorgan merged commit 8b4a53e into main Sep 6, 2024
6 checks passed
@jparismorgan jparismorgan deleted the jparismorgan/ivf-pq-reranking branch September 6, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants