You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem?
Yes, the current limitation of n_list to 20,000 in OpenSearch's FAISS KNN implementation restricts the ability to fine-tune the balance between search accuracy and performance for larger datasets (over than 10 billion of vectors). This can be particularly frustrating when working with high-dimensional data or large-scale vector search use cases, where a higher n_list value could improve recall and precision.
What solution would you like?
I would like the ability to configure or increase the n_list parameter beyond the current limit of 20,000. This would allow users to better optimize FAISS's IVF for their specific datasets and use cases. Ideally, this could be implemented as a configurable parameter in the OpenSearch KNN plugin, with appropriate warnings or documentation about the potential performance trade-offs.
What alternatives have you considered?
Using other indexing methods, such as HNSW, which may not require n_list but have their own trade-offs in terms of memory usage and search performance.
Running standalone FAISS outside of OpenSearch, though this would sacrifice distributed capabilities of OpenSearch.
Adjusting other FAISS parameters (e.g., n_probe) to compensate for the lower n_list, but this does not always provide the desired level of accuracy.
Do you have any additional context?
I noticed that OpenSearch imposes a limit of 20,000 for the n_list parameter when using FAISS for KNN search. Could you please explain the reasoning behind this limitation? Specifically:
Is this restriction related to performance considerations, such as indexing or query latency?
Are there technical constraints in the integration of FAISS with OpenSearch that necessitate this limit?
Are there plans to increase or make this limit configurable in future releases?
Additionally, if I need a higher n_list value for my use case, what alternatives or workarounds would you recommend?
Thank you for your insights!
The text was updated successfully, but these errors were encountered:
Thanks for the response! For large-scale datasets (e.g., 1B+ vectors), we need n_list in the range of hundreds of thousands to millions of centroids (e.g., 100K–1M) to achieve optimal accuracy.
For OpenSearch, we have one IVF per segment per shard. So, typically, we dont recommend having too large of clusters because a significant portion of the data structures will be duplicated. For instance, if you have a 10 shards on a node, with 10 segments, the centroids will be duplicated 10x10 times. We've mitigated some of this overhead (#1507) but it still doesnt remove all of it. Thus, having too large of centroids can cause memory concerns and thats why we capped at 20k.
Is your feature request related to a problem?
Yes, the current limitation of
n_list
to 20,000 in OpenSearch's FAISS KNN implementation restricts the ability to fine-tune the balance between search accuracy and performance for larger datasets (over than 10 billion of vectors). This can be particularly frustrating when working with high-dimensional data or large-scale vector search use cases, where a higher n_list value could improve recall and precision.What solution would you like?
I would like the ability to configure or increase the n_list parameter beyond the current limit of 20,000. This would allow users to better optimize FAISS's IVF for their specific datasets and use cases. Ideally, this could be implemented as a configurable parameter in the OpenSearch KNN plugin, with appropriate warnings or documentation about the potential performance trade-offs.
What alternatives have you considered?
Using other indexing methods, such as HNSW, which may not require n_list but have their own trade-offs in terms of memory usage and search performance.
Running standalone FAISS outside of OpenSearch, though this would sacrifice distributed capabilities of OpenSearch.
Adjusting other FAISS parameters (e.g.,
n_probe
) to compensate for the lowern_list
, but this does not always provide the desired level of accuracy.Do you have any additional context?
I noticed that OpenSearch imposes a limit of 20,000 for the
n_list
parameter when using FAISS for KNN search. Could you please explain the reasoning behind this limitation? Specifically:Additionally, if I need a higher
n_list
value for my use case, what alternatives or workarounds would you recommend?Thank you for your insights!
The text was updated successfully, but these errors were encountered: