Fix bug where we did not set compression filters when creating TileDB Array's in C++ #436
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
Currently when we create TileDB arrays with
create_vector()
orcreate_matrix()
, we pass afilter
, but then do not use it when creating the array. This leads to arrays (and indexes) that are very large. We fix that here.Results with small datasets
Before when we created a Vamana index with 4
float
vectors that have 3 dimensions each with this code:We'd end up with a 314.5 MB index, where each individual array containing vectors was 89.5 MB:
After this change we end up with 261 KB index, and an individual vector array takes up 66 KB:
Results on SIFT
The improvement is even greater with SIFT (1 million 128 dimension float32 vectors). When we create a Vamana index with this code:
Before we would end up with a 23.88 GB index where
adjacency_ids
alone takes up 15.55 GB:Before now our index is 761 MB and
adjacency_ids
takes up 281 MB: