Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Compare Cluster and ClusterFast scores and speedup #892

Merged
merged 33 commits into from
Jun 18, 2024

Conversation

isaac-chung
Copy link
Collaborator

@isaac-chung isaac-chung commented Jun 7, 2024

1) What

Part of #835 .

  • Runs english benchmark's clustering (except Arxiv since it's hierarchical) for both Clustering and ClusteringFast on e5-small, e5-base, e5-large, and a few more models.
python scripts/run_mteb_english_cluster.py
  • Calculates the spearman's correlation on the v_measures from both task types
  • sum the times to run both benchmarks, divide cluster by clusterfast to obtain speedup multiplier.
python scripts/mteb_english_cluster_spearman.py

2) Hardware

These were run on a single A10.

3) Scores on ClusteringFast + Speedup

These are the Spearman's correlation scores and speedups from Clustering to ClusteringFast. The old notes are hidden within each expand.

4% n_samples for every task except for RedditClustering and StackExchangeClustering with 32768)

In addition, p-values for each model is reported.

Task Spearman Significant Spearman Speedup
BiorxivClusteringP2P 0.9505 0.9390 31.50x
BiorxivClusteringS2S 0.9890 0.9679 14.31x
MedrxivClusteringP2P 0.9615 0.8200 21.48x
MedrxivClusteringS2S 0.9560 0.9510 8.39x
RedditClustering 0.9670 0.9790 11.72x
RedditClusteringP2P 0.9670 0.7370 22.77x
StackExchangeClustering 0.9121 0.9486 9.55x
StackExchangeClusteringP2P 0.9670 0.9497 20.20x
TwentyNewsgroupsClustering 1.0000 0.9832 5.02x
Average 0.9634 0.9195 16.11x
Old notes comparing across tasks.

a) Current Implementation

Model Spearman Speedup
BiorxivClusteringP2P 0.5 34.46x
BiorxivClusteringS2S 1.0 20.11x
MedrxivClusteringP2P 0.5 17.30x
MedrxivClusteringS2S 1.0 10.30x
RedditClustering 1.0 23.15x
RedditClusteringP2P 1.0 27.32x
StackExchangeClustering 1.0 18.71x
StackExchangeClusteringP2P 0.5 4.50x
TwentyNewsgroupsClustering 1.0 11.44x

b) New Commit: b8a5be4

Model Spearman Speedup
BiorxivClusteringP2P 0.5 3.68x
BiorxivClusteringS2S 1.0 3.46x
MedrxivClusteringP2P 1.0 1.83x
MedrxivClusteringS2S 1.0 1.80x
RedditClustering 1.0 17.38x
RedditClusteringP2P 1.0 21.55x
StackExchangeClustering 0.5 14.43x
StackExchangeClusteringP2P 0.5 3.68x
TwentyNewsgroupsClustering 1.0 2.61x

c) New Commit: 6295dcc (swap max_documents_per_cluster and max_documents_to_embed values)

Model Spearman Speedup
BiorxivClusteringP2P 0.5 34.38x
BiorxivClusteringS2S 1.0 14.12x
MedrxivClusteringP2P 1.0 17.09x
MedrxivClusteringS2S 1.0 8.63x
RedditClustering 1.0 75.32x
RedditClusteringP2P 1.0 156.63x
StackExchangeClustering 1.0 46.77x
StackExchangeClusteringP2P 1.0 29.39x
TwentyNewsgroupsClustering 1.0 9.68x

d) New Commit: 3a0d9c4 (c + increasing max_documents_per_cluster to 65536)

Model Spearman Speedup
BiorxivClusteringP2P 0.5 31.38x
BiorxivClusteringS2S 1.0 10.44x
MedrxivClusteringP2P 0.5 15.58x
MedrxivClusteringS2S 1.0 5.54x
RedditClustering 1.0 50.16x
RedditClusteringP2P 1.0 132.17x
StackExchangeClustering 1.0 32.46x
StackExchangeClusteringP2P 1.0 25.43x
TwentyNewsgroupsClustering 1.0 5.97x
Old notes comparing across models.

a) Current Implementation

Model Spearman Speedup
e5-small 0.7833 14.56x
e5-base 0.8333 15.80x
e5-large 0.8000 16.93x

b) New Commit: b8a5be4

Model Spearman Speedup
e5-small 0.8500 5.76x
e5-base 0.8833 5.97x
e5-large 0.8000 6.17x

c) New Commit: 6295dcc (swap max_documents_per_cluster and max_documents_to_embed values)

Model Spearman Speedup
e5-small 0.8833 32.05x
e5-base 0.9000 40.41x
e5-large 0.9000 52.77x

d) New Commit: 3a0d9c4 (c + increasing max_documents_per_cluster to 65536)

Model Spearman Speedup
e5-small 0.8833 22.21x
e5-base 0.9000 32.11x
e5-large 0.9500 47.11x

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

@isaac-chung isaac-chung changed the title [WIP] First go at getting spearman corr for e5-base [WIP] Compare Cluster and ClusterFast scores and speedup Jun 7, 2024
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @isaac-chung! A few surprising results here.

I might be afraid that the new method (resampling many documents) produces results closer to the original (which is nice), but what we are really interested in is that they produce the same model ordering and with the same (or better) power to differentiate between similar models.

Though resampling many times does seem to be quite cheap.

mteb/tasks/Clustering/eng/BiorxivClusteringP2P.py Outdated Show resolved Hide resolved
scripts/run_mteb_english_cluster.py Show resolved Hide resolved
scripts/run_mteb_english_cluster.py Outdated Show resolved Hide resolved
scripts/mteb_english_cluster_spearman.py Outdated Show resolved Hide resolved
@isaac-chung
Copy link
Collaborator Author

isaac-chung commented Jun 14, 2024

Looking ahead to what's needed to close/merge this PR, the following should be completed first:

Anything else I'm missing?

@KennethEnevoldsen
Copy link
Contributor

I think that it good

@isaac-chung
Copy link
Collaborator Author

Added plotting scripts and plots here. Will remove them from this PR, then we should be good to merge.

@isaac-chung isaac-chung marked this pull request as ready for review June 17, 2024 15:22
@isaac-chung isaac-chung changed the title [WIP] Compare Cluster and ClusterFast scores and speedup fix: Compare Cluster and ClusterFast scores and speedup Jun 17, 2024
@isaac-chung
Copy link
Collaborator Author

@KennethEnevoldsen just wanted a sanity ✅ before we merge, thanks!

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments otherwise looks good - once these are in feel free to merge. Thanks for taking the time on this one!

mteb/abstasks/AbsTaskClusteringFast.py Show resolved Hide resolved
mteb/abstasks/AbsTaskClusteringFast.py Outdated Show resolved Hide resolved
@isaac-chung isaac-chung merged commit 2bb7623 into main Jun 18, 2024
7 checks passed
@isaac-chung isaac-chung deleted the compare-scores-clustering-fast branch June 18, 2024 10:15
@isaac-chung isaac-chung mentioned this pull request Jun 19, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants