Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix language model repeated scoring #12

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

FieldsMedal
Copy link

In this pr,fix language model score repeatedly. When hotwords_scorer->is_character_based and ext_scorer->is_character_based() is false,The language model and hot word scores will be repeatedly calculated. In fact, if the language model is word based , it will only call the scorer whenever space_id is detected. After modification,
we tested all possibilities on the dataset.

first audio

set beam_size=10, num_processes = 1,blank_id = 0,space_id = 45,cutoff_prob = 1(increase cutoff_prob to generate space
),alpha =0.5 ,beta=0.5,window_length=4. hot_words = {'换一': -3.40282e+38, '首歌': -100, '换首歌': 3.40282e+38}

编号 模型 热词is_character_based 语言模型is_character_based 解码结果(best path)
1 都不使用 * * 换一首歌
2 热词 TRUE * 换首歌a<unk>
3 FALSE * 换首歌<space>A<space>爱'爱<unk>
4 语言 * TRUE 换一首歌
5 * FALSE 换一首
6 热词+语言 TRUE TRUE 换换首歌<unk>
7 TRUE FALSE 一首
8 FALSE TRUE 换首歌<space>A<space>爱'爱<unk>
9 FALSE FALSE 换一首

No. 7 and No. 9 hot words did not take effect. When the language model is_character_based is false, Words generated between two spaces should be in 1-grams or is a prefix of 1-grams. hotwords '换首歌' not in 1-grams.

second audio

set beam_size=10, num_processes = 1,blank_id = 0,space_id = 45,cutoff_prob = 1(increase cutoff_prob to generate space
),alpha =0.5 ,beta=0.5,window_length=4. hot_words = {'极点': 550}.Set the space to <space> before compiling ctc_decoder.

编号 模型 热词is_character_based 语言模型is_character_based 解码结果(best path)
1 都不使用 * * 几点了
2 热词 TRUE * 极点极点点了
3 FALSE * 极点<space><space><space><space>
4 语言 * TRUE 几点啦
5 * FALSE 几点啦
6 热词+语言 TRUE TRUE 极点极点极点啦
7 TRUE FALSE 极点<space>极点<space>极点
8 FALSE TRUE 极点<space><space><space><space>
9 FALSE FALSE 极点<space>是<space>是<space>是<space>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant