-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support out-of-core and distributed IVF_PQ ingestion #531
Conversation
+ "".join(random.choices(string.ascii_letters, k=10)) | ||
) | ||
if index_type == "IVF_PQ": | ||
PARTIAL_WRITE_ARRAY_DIR = storage_formats[storage_version][ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not follow the same dir structure? The random number part is required in order to support parallel or failed ingestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so I did not realize it was for parallel ingestions, I thought just failed ingestions. And so I added logic to delete the group before use in C++. I did this because I didn't want Python and C++ to have to communicate about what the name of the dir was. But I'll update to follow the pattern in this follow-up PR: #554
What
Here we update to support out-of-core and distributed ingestion in IVF_PQ. We do this by updating the IVF_PQ API. The general design is to use the IVF_FLAT code path, but use these IVF_PQ C++ functions:
create()
train()
ingest_parts()
consolidate_partitions()
Note that we also use
create_temp_data_group()
because to support re-ingestion with a new temp data group.In the future we may also want to move
compute_partition_indexes()
in C++, but for now we leave it in Python.We still are able to support C++-only use of the index with this approach, though to make things easier we also support
ingest()
so that a user only needs to callcreate()
,train()
, and theningest()
.Testing