Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LODR RNNLM rescoring requirements #749

Open
ngoel17 opened this issue Dec 9, 2022 · 26 comments
Open

LODR RNNLM rescoring requirements #749

ngoel17 opened this issue Dec 9, 2022 · 26 comments

Comments

@ngoel17
Copy link
Contributor

ngoel17 commented Dec 9, 2022

I am trying to understand the requirements on the RNNLM for LODR rescoring
I am using something along the lines of Librispeech pruned_transducer_stateless3 recipe with
https://github.com/k2-fsa/icefall/tree/master/egs/ptb/LM as prototype for LM training (except 3 layers, 600 dim and tie-weights true.
I get the following error messages:
2022-12-09 03:13:25,663 INFO [decode1.py:1185] lm filename: 2gram.fst.txt
2022-12-09 03:13:25,796 INFO [decode1.py:1191] num states: 453
2022-12-09 03:13:26,397 INFO [model.py:69] Tying weights
2022-12-09 03:13:26,397 INFO [checkpoint.py:112] Loading checkpoint from ../ngLM/rnnlm-exp/epoch-0.pt
Traceback (most recent call last):
File "/mnt/dsk1/icefall/egs/ng/./pruned_transducer_stateless3/decode1.py", line 1259, in
main()
File "/home/ngoel/anaconda3/envs/k2/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/mnt/dsk1/icefall/egs/ng/./pruned_transducer_stateless3/decode1.py", line 1210, in main
load_checkpoint(
File "/home/ngoel/icefall/icefall/checkpoint.py", line 126, in load_checkpoint
model.load_state_dict(checkpoint["model"], strict=strict)
File "/home/ngoel/anaconda3/envs/k2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1490, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for RnnLmModel:
size mismatch for input_embedding.weight: copying a param with shape torch.Size([500, 600]) from checkpoint, the shape in current model is torch.Size([500, 2048]).
size mismatch for rnn.weight_ih_l0: copying a param with shape torch.Size([2400, 600]) from checkpoint, the shape in current model is torch.Size([8192, 2048]).

It's not clear to me what this message means and how to fix this. Some guidance is appreciated.

@marcoyang1998
Copy link
Collaborator

It seems to me that the model architecture does not match. You are using the You might need to change hidden-dim to 600.

@csukuangfj
Copy link
Collaborator

Can you check that you use the same RNN LM model parameters for both train.py and decode.py?

@ngoel17
Copy link
Contributor Author

ngoel17 commented Dec 9, 2022

Thanks. That resolved it.

@ngoel17 ngoel17 closed this as completed Dec 9, 2022
@ngoel17
Copy link
Contributor Author

ngoel17 commented Dec 9, 2022

I ran into a different issue this time. May I get some guidance here on how to resolve the issue?
This doesn't happen on the very first pass into the code, but it does happen for the first file. The bigram LM seems to load fine though.

File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 1259, in
main()
File "/home/ngoel/anaconda3/envs/k2/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 1235, in main
results_dict = decode_dataset(
File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 862, in decode_dataset
hyps_dict = decode_one_batch(
File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 712, in decode_one_batch
hyp_tokens = modified_beam_search_rnnlm_LODR(
File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/beam_search.py", line 2304, in modified_beam_search_rnnlm_LODR
assert current_ngram_score <= 0.0, (
AssertionError: (-inf, -inf)

@csukuangfj
Copy link
Collaborator

Could you please post the full command you are using to invoke decode.py?

@ngoel17
Copy link
Contributor Author

ngoel17 commented Dec 9, 2022

python3 -m pdb ./pruned_transducer_stateless3/decode1.py
--iter 480000
--avg 20
--simulate-streaming 1
--causal-convolution 1
--decode-chunk-size 16
--left-context 64
--exp-dir ./pruned_transducer_stateless3/exp
--max-duration 600
--beam 4.0
--max-contexts 8
--max-states 32
--decoding-method modified_beam_search_rnnlm_LODR
--rnn-lm-scale 0.4
--rnn-lm-exp-dir ../LM/rnnlm-exp
--rnn-lm-epoch 0
--rnn-lm-avg 1
--rnn-lm-num-layers 3
--rnn-lm-tie-weights 1
--tokens-ngram 2
--ngram-lm-scale -0.16 \

@ngoel17 ngoel17 reopened this Dec 9, 2022
@ngoel17
Copy link
Contributor Author

ngoel17 commented Dec 9, 2022

I also notice something unusual about the 2gram.fst.txt.
It ends with
452 282 461 461 11.9809
452 0 500 0 5.33528
452 0.328656

but there is no state '0'. Transition to state "0" but state "0" is not defined. not even as a final state like 452 is.

@csukuangfj
Copy link
Collaborator

but there is no state '0'. Transition to state "0" but state "0" is not defined. not even as a final state like 452 is.

452 0 500 0 5.33528

If 500 is the ID of the #0, then I think state 0 corresponds to the backoff state.

@csukuangfj
Copy link
Collaborator

assert current_ngram_score <= 0.0, (
AssertionError: (-inf, -inf)

Could you check that the default value 500 is also the ID of #0 in your tokens.txt?

"--backoff-id",
type=int,
default=500,

@ngoel17
Copy link
Contributor Author

ngoel17 commented Dec 12, 2022

Yes backoff-id is 500. This is the end of tokens.txt - "" insertd by me so that github displays the text properly.
"I 496"
"P 497"
"* 498"
"[ 499"
"#0 500"
"#1 501"

@marcoyang1998
Copy link
Collaborator

marcoyang1998 commented Dec 14, 2022

I think there must be something wrong this your bigram. How was it generated?

Also, are you doing a cross-domain or intra-domain evaluation?

@ngoel17
Copy link
Contributor Author

ngoel17 commented Dec 14, 2022

Below is the LM generation script I used. In pruned_transducer_stateless3 a bunch of data for the 2nd output is cross-domain, but the primary is intra-domain. Evaluation is intra-domain, but previously unseen data.

#!/usr/bin/env bash

lang_dir=data/lang_bpe_500

for ngram in 2 ; do
if [ ! -f $lang_dir/${ngram}gram.arpa ]; then
./shared/make_kn_lm.py
-ngram-order ${ngram}
-text $lang_dir/transcript_tokens.txt
-lm $lang_dir/${ngram}gram.arpa
fi

if [ ! -f $lang_dir/${ngram}gram.fst.txt ]; then
python3 -m kaldilm
--read-symbol-table="$lang_dir/tokens.txt"
--disambig-symbol='#0'
--max-order=${ngram}
$lang_dir/${ngram}gram.arpa > $lang_dir/${ngram}gram.fst.txt
fi
done

@nshmyrev
Copy link
Contributor

nshmyrev commented Jul 15, 2023

Nevermind, I figured out, in order to avoid inf the LODR ngram should have <unk> and must have vocab of 500 tokens exactly

@sangeet2020
Copy link

sangeet2020 commented Feb 22, 2024

Hi @csukuangfj ,
I am also facing this issue, but havent been able to work it out and find a solution.
Could you please let me know how should I solve it?

Error logs

2024-02-22 13:26:51,691 INFO [decode.py:834] Decoding started
2024-02-22 13:26:51,692 INFO [decode.py:840] Device: cuda:0
2024-02-22 13:26:51,696 INFO [decode.py:850] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.17.0.dev+git.230c8fcb.clean', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'c78407a-dirty', 'icefall-git-date': 'Fri Feb 16 16:38:45 2024', 'icefall-path': '/mnt/local/sangeet/workncode/k2-fsa/icefall', 'k2-path': '/mnt/users/sagarst/envs/k2-gpu/lib/python3.11/site-packages/k2/__init__.py', 'lhotse-path': '/mnt/local/sangeet/workncode/lhotse/lhotse/__init__.py', 'hostname': 'emlgpu04', 'IP address': '127.0.1.1'}, 'epoch': 30, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal/1200'), 'bpe_model': 'Deu16_icefall/sample_data/lang_bpe_500/bpe.model', 'lang_dir': PosixPath('Deu16_icefall/sample_data/lm'), 'decoding_method': 'modified_beam_search_LODR', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': -0.24, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': True, 'lm_type': 'rnn', 'lm_scale': 0.42, 'tokens_ngram': 2, 'backoff_id': 500, 'context_score': 2.0, 'context_file': '', 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': False, 'manifest_dir': PosixPath('Deu16_icefall/sample_data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 19, 'lm_avg': 2, 'lm_exp_dir': '/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800/', 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/1200/modified_beam_search_LODR'), 'has_contexts': False, 'suffix': 'epoch-30-avg-7-chunk-16-left-context-128-modified_beam_search_LODR-beam-size-4-rnn-lm-scale-0.42-LODR-2gram-scale--0.24-use-averaged-model', 'blank_id': 0, 'unk_id': 3, 'vocab_size': 500}
2024-02-22 13:26:51,696 INFO [decode.py:852] About to create model
2024-02-22 13:26:52,409 INFO [decode.py:919] Calculating the averaged model over epoch range from 23 (excluded) to 30
2024-02-22 13:26:56,392 INFO [model.py:75] Tying weights
2024-02-22 13:26:56,392 INFO [lm_wrapper.py:180] averaging ['/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800//epoch-18.pt', '/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800//epoch-19.pt']
2024-02-22 13:26:58,886 INFO [decode.py:976] Loading token level lm: G_2_gram.fst.txt
2024-02-22 13:26:59,022 INFO [decode.py:982] num states: 12143
2024-02-22 13:26:59,027 INFO [decode.py:1018] Number of model parameters: 66110931
2024-02-22 13:26:59,027 INFO [asr_datamodule.py:409] About to get test cuts
Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda-11.2/lib64/libcublas.so.11: undefined symbol: cublasGetSmCountTarget
Traceback (most recent call last):
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 1051, in <module>
    main()
  File "/mnt/users/sagarst/envs/k2-gpu/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 1028, in main
    results_dict = decode_dataset(
                   ^^^^^^^^^^^^^^^
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 680, in decode_dataset
    hyps_dict = decode_one_batch(
                ^^^^^^^^^^^^^^^^^
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 535, in decode_one_batch
    hyp_tokens = modified_beam_search_LODR(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/zipformer/beam_search.py", line 2623, in modified_beam_search_LODR
    assert current_ngram_score <= 0.0, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: (-inf, -inf)

I have made sure that my RNN-LM training arguments are same as arguments for decode.py with modified_beam_search_LODR.
Also

wc -l Deu16_icefall/sample_data/lang_bpe_500/tokens.txt  >> 502

I am not sure how to make it 500.

and head and tail of tokens.txt looks like this

$$ head
<blk> 0
<sos/eos> 1
<UNK> 2
<unk> 3
▁ 4
S 5
T 6
EN 7
E 8
N 9


$$ tail
GRÖßTE 492
▁PAKISTAN 493
▁SEPTEMBER 494
▁STREIFEN 495
▁SCHWARZ 496
▁KÜNFTIG 497
▁STUTTGART 498
Q 499
#0 500
#1 501

If anyone help me solve it with some clue.

Thank You

@csukuangfj
Copy link
Collaborator

Xiaoyu, could you have a look?

@sangeet2020
Copy link

sangeet2020 commented Feb 22, 2024

I re-trained the RNN-LM and WER loks better. However they do not seem quite consistent.

WER with greedy search~ 9
WER with beam search~ 8.5

WER with modified beam search with Shallow Fusion and an external LM~ 8.6
WER with modified beam search with LODR (to counter ILM) and an external LM- ERROR
WER with modified beam search with LM rescoring to re-rank the n-best hypotheses after beam search~ 8.8

@marcoyang1998 any clues why this could be happening? Also, any help on how could I fix the above error.

Thank You

@sangeet2020
Copy link

Hello @marcoyang1998 ,
I was wondering if you had a chance to see the above error and point in some direction so that I can find out the reason and fix it.

@marcoyang1998
Copy link
Collaborator

marcoyang1998 commented Mar 26, 2024 via email

@sangeet2020
Copy link

Hi @marcoyang1998 ,

Waiting for updated. Any clue should be fine for me to figure out what could be wrong.

@duhtapioca
Copy link

Facing the same error

File /temp_ssd/icefall/egs/librispeech/ASR/zipformer/beam_search.py:2623, in modified_beam_search_LODR(model, encoder_out, encoder_out_lens, LODR_lm, LODR_lm_scale, LM, beam, context_graph)
   2620 # calculate the score of the latest token
   2621 current_ngram_score = state_cost.lm_score - hyp.state_cost.lm_score
-> 2623 assert current_ngram_score <= 0.0, (
   2624     state_cost.lm_score,
   2625     hyp.state_cost.lm_score,
   2626 )
   2627 # score = score + TDLM_score - LODR_score
   2628 # LODR_LM_scale should be a negative number here
   2629 hyp_log_prob += (
   2630     lm_score[new_token] * lm_scale
   2631     + LODR_lm_scale * current_ngram_score
   2632     + context_score
   2633 )  # add the lm score

AssertionError: (-inf, -inf)

Head of the tokens.txt

<blk> 0
<sos/eos> 1
<unk> 2

Tail of the tokens.txt

tail -4 tokens.txt 
#0 500
#1 501
#2 502
#3 503

Any help regarding this?

@danpovey
Copy link
Collaborator

danpovey commented Jul 7, 2024

I'll try to find someone to look into this. Basically we need to trace back where the infinity came from and why. That may require adding assert statements to catch the infinity earlier. We should also find or decide where an infinity is "allowed" according to the intended interfaces used here.

@csukuangfj
Copy link
Collaborator

The latest code in the master is

unsorted_indices = packed_encoder_out.unsorted_indices.tolist()
for i in range(N):
ans.append(sorted_ans[unsorted_indices[i]])
return ans
def modified_beam_search_LODR(

From your log

File /temp_ssd/icefall/egs/librispeech/ASR/zipformer/beam_search.py:2623, in modified_beam_search_LODR(model, encoder_out, encoder_out_lens, LODR_lm, LODR_lm_scale, LM, beam, context_graph)
   2620 # calculate the score of the latest token
   2621 current_ngram_score = state_cost.lm_score - hyp.state_cost.lm_score
-> 2623 assert current_ngram_score <= 0.0, (
   2624     state_cost.lm_score,
   2625     hyp.state_cost.lm_score,

Could you first try the latest master and see if the issue persists?
@duhtapioca

@sangeet2020
Copy link

I tried but ended up with the same error.

@marcoyang1998
Copy link
Collaborator

@duhtapioca Could you please run your code again and print out the value of new_token when triggering this assertion?

@duhtapioca
Copy link

Could you please run your code again and print out the value of new_token when triggering this assertion?

Yes, will try that and share the output soon.

@duhtapioca
Copy link

duhtapioca commented Jul 9, 2024

@marcoyang1998

The output is now

************** The new token is - 98
************** The new token is - 94
************** The new token is - 308
************** The new token is - 280
************** The new token is - 95
************** The new token is - 60
************** The new token is - 233
************** The new token is - 19
************** The new token is - 23
************** The new token is - 103
************** The new token is - 207
************** The new token is - 8

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 predict_modified_beamsearch_LODR(['/home/azureuser/users/shreya/hindi_test_Set/hindi_test_main/test_main_wavs/3aa183cc-dd52-4822-9f2d-9ccaebafbac2.wav'])

File /anaconda/envs/k2_icefall/lib/python3.11/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

Cell In[5], line 227, in predict_modified_beamsearch_LODR(batch)
    217 context_graph.build(contexts)
    219 hyp_tokens = modified_beam_search(
    220         model=model,
    221         encoder_out=encoder_out,
   (...)
    224         context_graph=context_graph,
    225     )
--> 227 hyp_tokens = modified_beam_search_LODR(
    228         model=model,
    229         encoder_out=encoder_out,
    230         encoder_out_lens=encoder_out_lens,
    231         beam=params.beam_size,
    232         LODR_lm=ngram_lm,
    233         LODR_lm_scale=-0.24,
    234         LM=LM,
    235         context_graph=None,
    236     )
    237 print("Hyp tokens created")
    239 for hyp in sp.decode(hyp_tokens):

File /temp_ssd/icefall/egs/librispeech/ASR/zipformer/beam_search.py:2623, in modified_beam_search_LODR(model, encoder_out, encoder_out_lens, LODR_lm, LODR_lm_scale, LM, beam, context_graph)
   2621 current_ngram_score = state_cost.lm_score - hyp.state_cost.lm_score
   2622 print("************** The new token is - "+ str(new_token))
-> 2623 assert current_ngram_score <= 0.0, (
   2624     state_cost.lm_score,
   2625     hyp.state_cost.lm_score,
   2626 )
   2627 # score = score + TDLM_score - LODR_score
   2628 # LODR_LM_scale should be a negative number here
   2629 hyp_log_prob += (
   2630     lm_score[new_token] * lm_scale
   2631     + LODR_lm_scale * current_ngram_score
   2632     + context_score
   2633 )  # add the lm score

AssertionError: (-inf, -inf)

tokens.txt generated by egs/librispeech/ASR/local/prepare_lang_bpe.py for reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants