Skip to content

Commit

Permalink
update README and test cases for multi ref
Browse files Browse the repository at this point in the history
  • Loading branch information
felixgwu committed Apr 18, 2020
1 parent 09eeee4 commit 15ca840
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 1 deletion.
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@

Automatic Evaluation Metric described in the paper [BERTScore: Evaluating Text Generation with BERT](https://arxiv.org/abs/1904.09675) (ICLR 2020).
#### News:
- Updated to version 0.3.2
- **Bug fixed**: fixing the bug in v0.3.1 when having multiple reference sentences.
- Supporting multiple reference sentences with our command line tool.
- Updated to version 0.3.1
- A new `BERTScorer` object that caches the model to avoid re-loading it multiple times. Please see our [jupyter notebook example](./example/Demo.ipynb) for the usage.
- Supporting multiple reference sentences for each example. The `score` function now can take a list of lists of strings as the references and return the score between the candidate sentence and its closest reference sentence.
Expand Down Expand Up @@ -110,12 +113,18 @@ roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.3.0)-rescaled P: 0.747044 R:

This makes the range of the scores larger and more human-readable. Please see this [post](./journal/rescale_baseline.md) for details.

When having multiple reference sentences, please use
```sh
bert-score -r example/refs.txt example/refs2.txt -c example/hyps.txt --lang en
```
where the `-r` argument supports an arbitrary number of reference files. Each reference file should have the same number of lines as your candidate/hypothesis file. The i-th line in each reference file corresponds to the i-th line in the candidate file.


2. To evaluate text files in other languages:

We currently support the 104 languages in multilingual BERT ([full list](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages)).

Please specify the two-letter abbrevation of the language. For instance, using `--lang zh` for Chinese text.
Please specify the two-letter abbreviation of the language. For instance, using `--lang zh` for Chinese text.

See more options by `bert-score -h`.

Expand Down
10 changes: 10 additions & 0 deletions example/refs2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
A 28-year-old chef who had recently moved to San Francisco was found dead in the stairwell of a local mall this week.
28-Year-Old Chef Found Dead at San Francisco Mall.
But the victim's brother says he can't think of anyone who would want to hurt him, saying
The body found at the Westfield Mall Wednesday morning was identified as 28-year-old San Francisco resident, Frank Galicia, the San Francisco Medical Examiner's Office said.
The San Francisco Police Department said the death was ruled a homicide and an investigation is ongoing.
The victim's brother, Louis Galicia, told ABC station KGO in San Francisco that Frank, previously a line cook in Boston, had landed his dream job as line chef at San Francisco's Sons & Daughters restaurant six months ago.
A spokesperson for Sons & Daughters said that they were "shocked and devastated" by his death.
"We are a small team that operates like a close-knit family and he will be dearly missed," the spokesperson said.
Our thoughts and condolences are with Frank's family and friends at this tough time.
Louis Galicia said Frank initially stayed in hostels, but recently, "Things were eventually going well for him."
14 changes: 14 additions & 0 deletions tests/test_score_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,20 @@ def test_multi_refs(self):
self.assertTrue((P_mul - P_best).abs_().max() < EPS)
self.assertTrue((R_mul - R_best).abs_().max() < EPS)
self.assertTrue((F_mul - F_best).abs_().max() < EPS)

def test_multi_refs_working(self):
cands = ['I like lemons.', 'Hi', 'Hey', 'Hello', 'Go']
refs = [
['I am proud of you.', 'I love lemons.', 'Go go go.'],
['I am proud of you.', 'Go go go.'],
['Hi'],
['I am proud of you.', 'I love lemons.', 'Go go go.', 'hello'],
['I am proud of you.', 'Go go go.', 'Go', 'Go to school'],
]
P_mul, R_mul, F_mul = bert_score.score(
cands, refs, batch_size=3, return_hash=False,
lang="en", rescale_with_baseline=True
)


if __name__ == '__main__':
Expand Down
16 changes: 16 additions & 0 deletions tests/test_scorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,5 +81,21 @@ def test_multi_refs(self):
self.assertTrue((R_mul - R_best).abs_().max() < EPS)
self.assertTrue((F_mul - F_best).abs_().max() < EPS)

def test_multi_refs_working(self):
scorer = bert_score.BERTScorer(lang="en", batch_size=3, rescale_with_baseline=True)

cands = ['I like lemons.', 'Hi', 'Hey', 'Hello', 'Go']
refs = [
['I am proud of you.', 'I love lemons.', 'Go go go.'],
['I am proud of you.', 'Go go go.'],
['Hi'],
['I am proud of you.', 'I love lemons.', 'Go go go.', 'hello'],
['I am proud of you.', 'Go go go.', 'Go', 'Go to school'],
]
P_mul, R_mul, F_mul = scorer.score(
cands, refs,
)


if __name__ == '__main__':
unittest.main()

0 comments on commit 15ca840

Please sign in to comment.