Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add error reporting for Retrieval #873

Merged
merged 10 commits into from
Jun 5, 2024

Conversation

jordiclive
Copy link
Contributor

@jordiclive jordiclive commented Jun 3, 2024

Adding Error reporting as discussed in #867.
(- Add cmd line kwargs) cmd line is under overhaul, will update if gets added to all tasks.

  • Separate tasks to be done in another PR

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! I think we can keep this PR's scope within retrieval first, then extend it to other tasks in a separate PR.

Could you see if you can get it running via command line, e.g. mteb -m intfloat/multilingual-e5-base -t kardosdrur/estonian-qa --export_errors ? That way:
a) we know that it works, and
b) the kwargs would be properly passed down. Feel free to start from mteb/cmd.py

mteb/abstasks/AbsTaskRetrieval.py Outdated Show resolved Hide resolved
mteb/abstasks/AbsTaskRetrieval.py Outdated Show resolved Hide resolved
@isaac-chung isaac-chung self-assigned this Jun 4, 2024
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
@jordiclive
Copy link
Contributor Author

@isaac-chung , Great! Happy to work on this and for other tasks.

Does mteb -m intfloat/multilingual-e5-base -t kardosdrur/estonian-qa work for you?

File "mteb/mteb/overview.py", line 230, in get_task
    return TASKS_REGISTRY[task_name]().filter_languages(languages, script)
           ~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'kardosdrur/estonian-qa'

I think kwargs like save_predictions are not supported in the cmd line, would we need a change like #877

@isaac-chung
Copy link
Collaborator

My suggestion is to add stuff like the command line option in this PR. At the end of the day, we want to be sure that whatever changes made will work. So that could be an added test, or extending the needed components to support running it on the command line.

Does mteb -m intfloat/multilingual-e5-base -t kardosdrur/estonian-qa work for you?

Oh my bad. Should have used the task name, not the dataset name. Try mteb -m intfloat/multilingual-e5-base -t EstQA instead?

@jordiclive jordiclive changed the title add error reporting. add error reporting for Retrieval Jun 4, 2024
@jordiclive jordiclive marked this pull request as ready for review June 4, 2024 21:28
@jordiclive
Copy link
Contributor Author

jordiclive commented Jun 4, 2024

@isaac-chung Great thanks, I added kwargs option (works for save_predictions as well) in this PR and checked it works with
mteb -m intfloat/multilingual-e5-base -t EstQA --export_errors True

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! Just a few small comments. I'll also enable the tests now.

mteb/cmd.py Outdated Show resolved Hide resolved
mteb/abstasks/AbsTaskRetrieval.py Outdated Show resolved Hide resolved
mteb/abstasks/AbsTaskRetrieval.py Show resolved Hide resolved
@isaac-chung isaac-chung changed the title add error reporting for Retrieval fix: Add error reporting for Retrieval Jun 5, 2024
jordiclive and others added 2 commits June 5, 2024 08:32
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
@jordiclive
Copy link
Contributor Author

@isaac-chung thanks, changed kwargs.

@KennethEnevoldsen
Copy link
Contributor

@jordiclive and @isaac-chung just letting you both know that I made an overhaul to the cli in #882. From my reading of this PR though they should still be compatible (I can just add the arguments into the mteb run parser).

However, when reading through it does seem problematic to add a flag that only works for some of the tasks. Instead, I would add in the non-cli changes, making it possible to create a Python script for running it locally (or alternatively the new CLI construction can be extended in a python script):

python my_mteb_cli.py ... # same as mteb, just with the new arguments added to it

In the future, we can then fully fledge out the flag such that errors are saved for all task types.

Let me know what you think about this - the intend here is to make the CLI more transparent not to remove flexibility from the user.

@jordiclive
Copy link
Contributor Author

@KennethEnevoldsen ok so remove the cli? Just like save_predictions cannot be enabled through the cmd line and is a kwarg for MTEB.run. In a previous commit it was optional kwargs to cmd line

Yes, I was planning to try and implement this feature for the other tasks.

OK cool with your PR, good to neaten up cmd line and documentation—I think I tried the current main this morning and there were some issues with normal running and output folder.

@KennethEnevoldsen
Copy link
Contributor

Yea remove it for now and then we can add it back into the CLI, when it is available for all tasks.

If you experience any issues, please submit an issue and we will make sure to take a look at it.

@jordiclive
Copy link
Contributor Author

ok, done.

Will do.

@isaac-chung
Copy link
Collaborator

However, when reading through it does seem problematic to add a flag that only works for some of the tasks.

@KennethEnevoldsen Yes that makes sense. I originally thought the CLI would have been a convenient avenue to test out the non-cli changes. should have stuck with the test suite 😅 thanks for taking a look!

@isaac-chung
Copy link
Collaborator

@jordiclive thanks for your contributions! I'll merge this in now.

@isaac-chung isaac-chung merged commit 5397bd2 into embeddings-benchmark:main Jun 5, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants