-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LODR RNNLM rescoring requirements #749
Comments
It seems to me that the model architecture does not match. You are using the You might need to change |
Can you check that you use the same RNN LM model parameters for both train.py and decode.py? |
Thanks. That resolved it. |
I ran into a different issue this time. May I get some guidance here on how to resolve the issue? File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 1259, in |
Could you please post the full command you are using to invoke decode.py? |
python3 -m pdb ./pruned_transducer_stateless3/decode1.py |
I also notice something unusual about the 2gram.fst.txt. but there is no state '0'. Transition to state "0" but state "0" is not defined. not even as a final state like 452 is. |
If |
Could you check that the default value icefall/egs/librispeech/ASR/pruned_transducer_stateless3/decode.py Lines 460 to 462 in b25c234
|
Yes backoff-id is 500. This is the end of tokens.txt - "" insertd by me so that github displays the text properly. |
I think there must be something wrong this your bigram. How was it generated? Also, are you doing a cross-domain or intra-domain evaluation? |
Below is the LM generation script I used. In pruned_transducer_stateless3 a bunch of data for the 2nd output is cross-domain, but the primary is intra-domain. Evaluation is intra-domain, but previously unseen data. #!/usr/bin/env bash lang_dir=data/lang_bpe_500 for ngram in 2 ; do if [ ! -f |
Nevermind, I figured out, in order to avoid inf the LODR ngram should have |
Hi @csukuangfj , Error logs 2024-02-22 13:26:51,691 INFO [decode.py:834] Decoding started
2024-02-22 13:26:51,692 INFO [decode.py:840] Device: cuda:0
2024-02-22 13:26:51,696 INFO [decode.py:850] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.17.0.dev+git.230c8fcb.clean', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'c78407a-dirty', 'icefall-git-date': 'Fri Feb 16 16:38:45 2024', 'icefall-path': '/mnt/local/sangeet/workncode/k2-fsa/icefall', 'k2-path': '/mnt/users/sagarst/envs/k2-gpu/lib/python3.11/site-packages/k2/__init__.py', 'lhotse-path': '/mnt/local/sangeet/workncode/lhotse/lhotse/__init__.py', 'hostname': 'emlgpu04', 'IP address': '127.0.1.1'}, 'epoch': 30, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal/1200'), 'bpe_model': 'Deu16_icefall/sample_data/lang_bpe_500/bpe.model', 'lang_dir': PosixPath('Deu16_icefall/sample_data/lm'), 'decoding_method': 'modified_beam_search_LODR', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': -0.24, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': True, 'lm_type': 'rnn', 'lm_scale': 0.42, 'tokens_ngram': 2, 'backoff_id': 500, 'context_score': 2.0, 'context_file': '', 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': False, 'manifest_dir': PosixPath('Deu16_icefall/sample_data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 19, 'lm_avg': 2, 'lm_exp_dir': '/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800/', 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/1200/modified_beam_search_LODR'), 'has_contexts': False, 'suffix': 'epoch-30-avg-7-chunk-16-left-context-128-modified_beam_search_LODR-beam-size-4-rnn-lm-scale-0.42-LODR-2gram-scale--0.24-use-averaged-model', 'blank_id': 0, 'unk_id': 3, 'vocab_size': 500}
2024-02-22 13:26:51,696 INFO [decode.py:852] About to create model
2024-02-22 13:26:52,409 INFO [decode.py:919] Calculating the averaged model over epoch range from 23 (excluded) to 30
2024-02-22 13:26:56,392 INFO [model.py:75] Tying weights
2024-02-22 13:26:56,392 INFO [lm_wrapper.py:180] averaging ['/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800//epoch-18.pt', '/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800//epoch-19.pt']
2024-02-22 13:26:58,886 INFO [decode.py:976] Loading token level lm: G_2_gram.fst.txt
2024-02-22 13:26:59,022 INFO [decode.py:982] num states: 12143
2024-02-22 13:26:59,027 INFO [decode.py:1018] Number of model parameters: 66110931
2024-02-22 13:26:59,027 INFO [asr_datamodule.py:409] About to get test cuts
Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda-11.2/lib64/libcublas.so.11: undefined symbol: cublasGetSmCountTarget
Traceback (most recent call last):
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 1051, in <module>
main()
File "/mnt/users/sagarst/envs/k2-gpu/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 1028, in main
results_dict = decode_dataset(
^^^^^^^^^^^^^^^
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 680, in decode_dataset
hyps_dict = decode_one_batch(
^^^^^^^^^^^^^^^^^
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 535, in decode_one_batch
hyp_tokens = modified_beam_search_LODR(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/zipformer/beam_search.py", line 2623, in modified_beam_search_LODR
assert current_ngram_score <= 0.0, (
^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: (-inf, -inf)
I have made sure that my RNN-LM training arguments are same as arguments for decode.py with
I am not sure how to make it 500. and head and tail of tokens.txt looks like this
If anyone help me solve it with some clue. Thank You |
Xiaoyu, could you have a look? |
I re-trained the RNN-LM and WER loks better. However they do not seem quite consistent. WER with greedy search~ 9 WER with modified beam search with Shallow Fusion and an external LM~ 8.6 @marcoyang1998 any clues why this could be happening? Also, any help on how could I fix the above error. Thank You |
Hello @marcoyang1998 , |
Sorry for getting back so late, I will have a look at the error this week.
Sangeet Sagar ***@***.***>于2024年3月26日 周二22:00写道:
… Hello @marcoyang1998 <https://github.com/marcoyang1998> ,
I was wondering if you had a chance to see the above error and point in
some direction so that I can find out the reason and fix it.
—
Reply to this email directly, view it on GitHub
<#749 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK6YBCPTTLCALULNEF3KZJDY2FWQFAVCNFSM6AAAAAASY3RWFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGUYDQOJSGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @marcoyang1998 , Waiting for updated. Any clue should be fine for me to figure out what could be wrong. |
Facing the same error
Head of the tokens.txt
Tail of the tokens.txt
Any help regarding this? |
I'll try to find someone to look into this. Basically we need to trace back where the infinity came from and why. That may require adding assert statements to catch the infinity earlier. We should also find or decide where an infinity is "allowed" according to the intended interfaces used here. |
The latest code in the master is icefall/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py Lines 2618 to 2625 in 2d64228
From your log
Could you first try the latest master and see if the issue persists? |
I tried but ended up with the same error. |
@duhtapioca Could you please run your code again and print out the value of |
Yes, will try that and share the output soon. |
The output is now
tokens.txt generated by egs/librispeech/ASR/local/prepare_lang_bpe.py for reference. |
I am trying to understand the requirements on the RNNLM for LODR rescoring
I am using something along the lines of Librispeech pruned_transducer_stateless3 recipe with
https://github.com/k2-fsa/icefall/tree/master/egs/ptb/LM as prototype for LM training (except 3 layers, 600 dim and tie-weights true.
I get the following error messages:
2022-12-09 03:13:25,663 INFO [decode1.py:1185] lm filename: 2gram.fst.txt
2022-12-09 03:13:25,796 INFO [decode1.py:1191] num states: 453
2022-12-09 03:13:26,397 INFO [model.py:69] Tying weights
2022-12-09 03:13:26,397 INFO [checkpoint.py:112] Loading checkpoint from ../ngLM/rnnlm-exp/epoch-0.pt
Traceback (most recent call last):
File "/mnt/dsk1/icefall/egs/ng/./pruned_transducer_stateless3/decode1.py", line 1259, in
main()
File "/home/ngoel/anaconda3/envs/k2/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/mnt/dsk1/icefall/egs/ng/./pruned_transducer_stateless3/decode1.py", line 1210, in main
load_checkpoint(
File "/home/ngoel/icefall/icefall/checkpoint.py", line 126, in load_checkpoint
model.load_state_dict(checkpoint["model"], strict=strict)
File "/home/ngoel/anaconda3/envs/k2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1490, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for RnnLmModel:
size mismatch for input_embedding.weight: copying a param with shape torch.Size([500, 600]) from checkpoint, the shape in current model is torch.Size([500, 2048]).
size mismatch for rnn.weight_ih_l0: copying a param with shape torch.Size([2400, 600]) from checkpoint, the shape in current model is torch.Size([8192, 2048]).
It's not clear to me what this message means and how to fix this. Some guidance is appreciated.
The text was updated successfully, but these errors were encountered: