You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When reviewing the new gte-modernbert-base model, I noticed I struggled to reproduce their CoIR results with the coir library. After a bit of digging and a pointer from the authors, it appears that the mteb library matches their results, but that those are wildly different from what coir reports!
Recently, there's also been some discussions about code retrieval mismatched results about the new SFR model vs Voyager (here), and while I haven't yet had time to test it out, the magnitude of the discrepancies appear to be fairly similar to what I'm seeing, so this could be the issue.
Even more puzzling, in trying to figure out which one was correct, I whipped up an extremely simple ST + ranx notebook and it gave me results that... matched neither library 😭 although it was way closer to mteb than to coir. This was put together very quickly late at night, so there might be one silly mistake somewhere in there causing the issues.
As per the discussions in the MTEB repo embeddings-benchmark/mteb#1861 (comment), it seems the issue is because the default behaviour of YourCustomDEModel involves silently adding prefixes to queries & documents.
@archersama perhaps it would be worthwhile to not have this as the default behaviour?
This is a crosspost for visibility of the issue I just opened on mteb: embeddings-benchmark/mteb#1861
Hi there!
When reviewing the new gte-modernbert-base model, I noticed I struggled to reproduce their CoIR results with the
coir
library. After a bit of digging and a pointer from the authors, it appears that themteb
library matches their results, but that those are wildly different from whatcoir
reports!Recently, there's also been some discussions about code retrieval mismatched results about the new SFR model vs Voyager (here), and while I haven't yet had time to test it out, the magnitude of the discrepancies appear to be fairly similar to what I'm seeing, so this could be the issue.
Even more puzzling, in trying to figure out which one was correct, I whipped up an extremely simple ST +
ranx
notebook and it gave me results that... matched neither library 😭 although it was way closer tomteb
than tocoir
. This was put together very quickly late at night, so there might be one silly mistake somewhere in there causing the issues.I've put together a repository to reproduce the exact issue with minimal scripts, using exactly the code I ran.
Direct links:
coir
coderanx
codemteb
codeLet me know if I can do anything else to help diagnose this!
The text was updated successfully, but these errors were encountered: