Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added prompting along with e5 instruct #888

Merged
merged 11 commits into from
Jun 15, 2024
Merged

Conversation

KennethEnevoldsen
Copy link
Contributor

All encode calls now include a prompt_name (following the variable name in sentence transformer's encode), which provided the task name. The model can then use a custom prompt pr. task if they wish (or encode based on e.g. task type).

Additionally to test it out I have also added the two e5 instruct models.

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Adding a model checklist

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision_id) and
    • mteb.get_model_meta(model_name, revision_id)
  • I have tested the implementation works on a representative set of tasks.

All encode calls now include a prompt_name (following the variable name in sentence transformer's encode), which provided the task name. The model can then use a custom prompt pr. task if they wish (or encode based on e.g. task type).

Additionally to test it out I have also added the two e5 instruct models.
Copy link
Contributor

@imenelydiaker imenelydiaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left some comments 🙂

mteb/models/instructions.py Show resolved Hide resolved
mteb/models/instructions.py Show resolved Hide resolved
mteb/evaluation/evaluators/normalize_encode.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing 🙌

mteb/evaluation/evaluators/model_encode.py Show resolved Hide resolved
mteb/models/instructions.py Outdated Show resolved Hide resolved
mteb/models/instructions.py Show resolved Hide resolved
mteb/models/e5_instruct.py Show resolved Hide resolved
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Copy link
Contributor

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! It looks great tho would love if @orionw could have a quick look whether or not this aligns with Retrieval w/ Instructions 🙌

mteb/models/__init__.py Outdated Show resolved Hide resolved
mteb/models/instructions.py Show resolved Hide resolved
tests/test_RetrievalEvaluator.py Outdated Show resolved Hide resolved
@orionw
Copy link
Contributor

orionw commented Jun 14, 2024

Looks great @KennethEnevoldsen! Using the same as Retrieval for InstructionRetrieval is great and it will be nice for all the many models using it these days. For many tasks in InstructionRetrieval the models did better with the prefix and the instruction so I added them both manually, glad to have it be part of this. Thanks for the ping @Muennighoff

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
mteb/encoder_interface.py Show resolved Hide resolved
mteb/models/instructions.py Show resolved Hide resolved
mteb/models/instructions.py Show resolved Hide resolved
@KennethEnevoldsen KennethEnevoldsen enabled auto-merge (squash) June 15, 2024 12:04
@KennethEnevoldsen KennethEnevoldsen merged commit 40208cf into main Jun 15, 2024
7 checks passed
@KennethEnevoldsen KennethEnevoldsen deleted the add-e5-instruct branch June 15, 2024 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants