v0.3.2 - Lazy tokenization for Parallel Sentence Training & Improved Semantic Search
This is a minor release. There should be no breaking changes.
- ParallelSentencesDataset: Datasets are tokenized on-the-fly, saving some start-up time
- util.pytorch_cos_sim - Method. New method to compute cosine similarity with pytorch. About 100 times faster than scipy cdist. semantic_search.py example has been updated accordingly.
- SentenceTransformer.encode: New parameter: convert_to_tensor. If set to true, encode returns one large pytorch tensor with your embeddings