You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried reproducing the results for the jinaai/jina-embeddings-v2-small-en on the MTEB HotpotQA task, but my results are significantly lower than the self-reported results shared on HuggingFace. Specifically, my ndcg_at_10 score is 1.966, while the result reported on HuggingFace for the same task and metric is 56.482.
Here’s the code I used:
importmtebmodel_names= [
"jinaai/jina-embeddings-v2-small-en",
]
tasks= [
mteb.get_task("HotpotQA", languages= ["eng"]),
]
formodel_nameinmodel_names:
model=mteb.get_model(model_name) # if the model is not implemented in MTEB it will be eq. to SentenceTransformer(model_name) evaluation=mteb.MTEB(tasks=tasks)
results=evaluation.run(model, output_folder=f"./results", batch_size=2)
I think it's how transformers work, it will allow the model to load but the weights are randomly initialized. A warning message Wii show up in terminal but won't disturb the script
I tried reproducing the results for the jinaai/jina-embeddings-v2-small-en on the MTEB HotpotQA task, but my results are significantly lower than the self-reported results shared on HuggingFace. Specifically, my
ndcg_at_10
score is 1.966, while the result reported on HuggingFace for the same task and metric is 56.482.Here’s the code I used:
Below is my result for HotpotQA:
Are there additional configurations or preprocessing steps needed to achieve the reported performance?
The text was updated successfully, but these errors were encountered: