Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use stratification in multilabel tasks. #850

Closed
9 tasks
x-tabdeveloping opened this issue May 30, 2024 · 3 comments
Closed
9 tasks

Use stratification in multilabel tasks. #850

x-tabdeveloping opened this issue May 30, 2024 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@x-tabdeveloping
Copy link
Collaborator

Currently there are multiple multilabel tasks that use dummy or no subsampling leading to potentially unrealistic results.
We will soon have a PR merged that introduces multilabel stratification #760. And it would be very useful to convert all tasks to using this.

These are:

  • Hierarchical clustering tasks:
    • SNL
    • VG
    • ArXiv
    • Blurbs
  • Multilabel classification tasks:
    • MalteseNews
    • MultiEURLEX
    • BrazilianToxicTweets
@x-tabdeveloping x-tabdeveloping added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels May 30, 2024
@x-tabdeveloping
Copy link
Collaborator Author

@KennethEnevoldsen @imenelydiaker @orionw @isaac-chung @Muennighoff Is this something we want to get done quickly before running everything or should we just roll with things as they are?

@KennethEnevoldsen
Copy link
Contributor

The current subsampling is just random. I don't believe it is a reason to convert them (at least for hierarchical), we can do it for new tasks.

@x-tabdeveloping
Copy link
Collaborator Author

Alright, I will close this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants