-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepencies in CoIR results #1861
Comments
Sister issue in |
This is probably because MTEB adds the I also looked into your implementation of COIR, and you're using In your custom implementation you normalize your embeddings, maybe this gives mismates too |
Hey, thanks for the response!
I thought so as well so I used cosqa for the example, to avoid things being messed up by different
I ran it both ways to be sure. The non-normalized ndcg@10 is
Oh well spotted! I think this is the culprit. I didn't assume that it'd add prefixes by default, but re-running it gets me within ~1NDCG@10 of the MTEB results. Somewhat curious why the manual implementation is well off 🤔 |
Maybe the problem is that you're using |
Cos sim is generally always ran on normalized vectors, in which case np.dot would be mathematically equivalent wouldn't it? eg the mteb/mteb/evaluation/evaluators/utils.py Line 15 in fa5127a
Anyhow, this is a much smaller issue than it seemed -- the main problem is that CoIR shouldn't add prefixes by default! |
Hi there!
When reviewing the new gte-modernbert-base model, I noticed I struggled to reproduce their CoIR results with the
coir
library. After a bit of digging and a pointer from the authors, it appears that themteb
library matches their results, but that those are wildly different from whatcoir
reports!Recently, there's also been some discussions about code retrieval mismatched results about the new SFR model vs Voyager (here), and while I haven't yet had time to test it out, the magnitude of the discrepancies appear to be fairly similar to what I'm seeing, so this could be the issue.
Even more puzzling, in trying to figure out which one was correct, I whipped up an extremely simple ST +
ranx
notebook and it gave me results that... matched neither library 😭 although it was way closer tomteb
than tocoir
. This was put together very quickly late at night, so there might be one silly mistake somewhere in there causing the issues.I've put together a repository to reproduce the exact issue with minimal scripts, using exactly the code I ran.
Direct links:
coir
coderanx
codemteb
codeLet me know if I can do anything else to help diagnose this!
cc @tomaarsen @orionw @Muennighoff
The text was updated successfully, but these errors were encountered: