-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize Custom search() Method #1826
Comments
Completely agree. Would be nice to just specify an interface for this such that any model could implement with. (cc @orionw) |
Thanks for raising @sam-hey! I can definitely see the benefit! On the other hand, having it standardized makes it so each model class has the same function and is more reliable that way. I can see both sides, but personally I think I would prefer to keep the core search functions in MTEB, so users can see them there and assume each model searches the same within their own “class” (eg that all dense retrievers use the same base functionality). I think it’d be great if we made BM25 a first class MTEB model so we didn’t have to rely on that (and could also add other sparse non-neural versions like Pyserini). OTOH, there are probably 3 ish other model “classes” or types that would involve a different search functionality: multi-vector (like ColBERT as you say), and then perhaps neural sparse retrieval (like Splade) and generative retrieval. So we should definitely make it so that each of these could be added, which as @KennethEnevoldsen says likely involves a change to the interface. But since there are less than 10 model “classes”, it seems like we could do that with an if statement. But perhaps it’s too early in the morning and I’m missing something! |
Currently, only BM25 uses a custom implementation of the
search()
method, achieved by checking if the model name isbm25s
. This approach is not scalable or practical for future implementations requiring custom search methods, such asColBERT
with an index. A more flexible and modular solution is needed to accommodate diverse search strategies.https://github.com/embeddings-benchmark/mteb/blob/main/mteb/evaluation/evaluators/RetrievalEvaluator.py#L472:L475
The text was updated successfully, but these errors were encountered: