Skip to content

v2.6.1 - Fix Quantized Semantic Search rescoring

Compare
Choose a tag to compare
@tomaarsen tomaarsen released this 26 Mar 09:01
· 286 commits to master since this release

This is a patch release to fix a bug in semantic_search_faiss and semantic_search_usearch that caused the scores to not correspond to the returned corpus indices. Additionally, you can now evaluate embedding models after quantizing their embeddings.

Precision support in EmbeddingSimilarityEvaluator

You can now pass precision to the EmbeddingSimilarityEvaluator to evaluate the performance after quantization:

from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator, SimilarityFunction
import datasets

model = SentenceTransformer("all-mpnet-base-v2")

stsb = datasets.load_dataset("mteb/stsbenchmark-sts", split="test")

print("Spearman correlation based on Cosine Similarity on the STS Benchmark test set:")
for precision in ["float32", "uint8", "int8", "ubinary", "binary"]:
    evaluator = EmbeddingSimilarityEvaluator(
        stsb["sentence1"],
        stsb["sentence2"],
        [score / 5 for score in stsb["score"]],
        main_similarity=SimilarityFunction.COSINE,
        name="sts-test",
        precision=precision,
    )
    print(precision, evaluator(model))
Spearman correlation based on Cosine Similarity on the STS Benchmark test set:
float32 0.8342190421330611
uint8 0.8260094846238505
int8 0.8312754408857808
ubinary 0.8244338431442343
binary 0.8244338431442343

All changes

  • Add 'precision' support to the EmbeddingSimilarityEvaluator by @tomaarsen in #2559
  • [hotfix] Quantization patch; fix semantic_search_faiss/semantic_search_usearch rescoring by @tomaarsen in #2558
  • Fix a typo in a docstring in CosineSimilarityLoss.py by @bryant1410 in #2553

Full Changelog: v2.6.0...v2.6.1