Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate JMTEB #749

Closed
Muennighoff opened this issue May 16, 2024 · 6 comments
Closed

Integrate JMTEB #749

Muennighoff opened this issue May 16, 2024 · 6 comments
Labels
good first issue Good for newcomers new dataset Issues related to adding a new task or dataset

Comments

@Muennighoff
Copy link
Contributor

Would be great to have the datasets in JMTEB (https://github.com/sbintuitions/JMTEB) integrated into MTEB for those which aren't yet already, so we can also add a Japanese leaderboard sometime 😊 cc @lsz05 @ryokan0123 @masaya-ohagi & anyone who may be interested

@awinml
Copy link
Contributor

awinml commented May 20, 2024

I would like to help integrate JMTEB. The following datasets from JMTEB are already present in MTEB:

The following ones would need to be added:

@AlexeyVatolin
Copy link
Contributor

@Muennighoff, I can take over the integration of ESCI. I have already looked at the contents of this dataset. In the original dataset https://huggingface.co/datasets/tasksource/esci there are three languages: English, Spanish and Japanese. Do we want to add all three languages or just Japanese?

@Muennighoff
Copy link
Contributor Author

@Muennighoff, I can take over the integration of ESCI. I have already looked at the contents of this dataset. In the original dataset https://huggingface.co/datasets/tasksource/esci there are three languages: English, Spanish and Japanese. Do we want to add all three languages or just Japanese?

I think if we can add all 3 that'd be even better! 🙌

@KennethEnevoldsen
Copy link
Contributor

@awinml do you still want to finish up this issue or should I mark is as "up for graps"?

@awinml
Copy link
Contributor

awinml commented Sep 9, 2024

@KennethEnevoldsen, I don't have the bandwidth at the moment to add the remaining three datasets. Feel free to mark it as open for contributions. I’m happy to help review any PRs that come in!

@KennethEnevoldsen KennethEnevoldsen added good first issue Good for newcomers new dataset Issues related to adding a new task or dataset labels Sep 9, 2024
@KennethEnevoldsen
Copy link
Contributor

I suspected as much - thanks for the help so far though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers new dataset Issues related to adding a new task or dataset
Projects
None yet
Development

No branches or pull requests

5 participants