Releases: Tiiiger/bert_score
Releases · Tiiiger/bert_score
Version 0.3.3
- Fixing the bug with empty strings issue #47.
- Supporting 6 ELECTRA models and 24 smaller BERT models.
- A new Google sheet for keeping the performance (i.e., pearson correlation with human judgment) of different models on WMT16 to-English.
- Including the script for tuning the best number of layers of an English pre-trained model on WMT16 to-English data (See the details).
Version 0.3.2
- Bug fixed: fixing the bug in v0.3.1 when having multiple reference sentences.
- Supporting multiple reference sentences with our command-line tool
Version 0.3.1
- A new
BERTScorer
object that caches the model to avoid re-loading it multiple times. Please see our jupyter notebook example for the usage. - Supporting multiple reference sentences for each example. The
score
function now can take a list of lists of strings as the references and return the score between the candidate sentence and its closest reference sentence.
Version 0.3.0
- Supporting Baseline Rescaling: we apply a simple linear transformation to enhance the readability of BERTscore using pre-computed "baselines". It has been pointed out (e.g. by #20, #23) that the numerical range of BERTScore is exceedingly small when computed with RoBERTa models. In other words, although BERTScore correctly distinguishes examples through ranking, the numerical scores of good and bad examples are very similar. We detail our approach in a separate post.
Version 0.2.3
- Supporting DistilBERT (Sanh et al.), ALBERT (Lan et al.), and XLM-R (Conneau et al.) models.
- Including the version of huggingface's transformers in the hash code for reproducibility
Version 0.2.2
- Bug fixed: when using RoBERTaTokenizer, we now set
add_prefix_space=True
which was the default setting in huggingface'spytorch_transformers
(when we ran the experiments in the paper) before they migrated it totransformers
. This breaking change intransformers
leads to a lower correlation with human evaluation. To reproduce our RoBERTa results in the paper, please use version0.2.2
. - The best number of layers for DistilRoBERTa is included
- Supporting loading a custom model
Version 0.2.1
- SciBERT (Beltagy et al.) models are now included. Thanks to AI2 for sharing the models. By default, we use the 9th layer (the same as BERT-base), but this is not tuned.
Version 0.2.0
- Supporting BERT, XLM, XLNet, and RoBERTa models using huggingface's Transformers library
- Automatically picking the best model for a given language
- Automatically picking the layer based on a model
- IDF is not set as default as we show in the new version that the improvement brought by importance weighting is not consistent