You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
I want to fine tune "stt_en_fastconformer_hybrid_large_streaming_multi" model on my custom data.
I want to know some best practices that we can follow to fine tune tune cache aware streaming model.
I am using audio of fixed length (2s). Is this good? Can I have audios' of different lengths? Total duration of audio required to finetune on a dataset of different domain (Medical Data)?
Which tokenizer to use? Should we finetune using custom tokenizer which will be created with new data or should we fine tune using the default tokenizer and just with new audio?
How can I make this model work with a different language? Can I fine tune this model directly on audio of different language for eg Spansh audio? Or how can we use this on different language?
How to resume training for this model because I cannot train in one go? If I finetune using NeMo/examples/asr/speech_to_text_finetune.py?
Should I use speech_to_text_finetune.py or speech_to_text_hybrid_rnnt_ctc_bpe.py, I want to try out with old vocabulary as well as new one and I want to stop and continue training multiple times.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi
I want to fine tune "stt_en_fastconformer_hybrid_large_streaming_multi" model on my custom data.
I want to know some best practices that we can follow to fine tune tune cache aware streaming model.
I am using audio of fixed length (2s). Is this good? Can I have audios' of different lengths? Total duration of audio required to finetune on a dataset of different domain (Medical Data)?
Which tokenizer to use? Should we finetune using custom tokenizer which will be created with new data or should we fine tune using the default tokenizer and just with new audio?
How can I make this model work with a different language? Can I fine tune this model directly on audio of different language for eg Spansh audio? Or how can we use this on different language?
How to resume training for this model because I cannot train in one go? If I finetune using NeMo/examples/asr/speech_to_text_finetune.py?
Should I use speech_to_text_finetune.py or speech_to_text_hybrid_rnnt_ctc_bpe.py, I want to try out with old vocabulary as well as new one and I want to stop and continue training multiple times.
Beta Was this translation helpful? Give feedback.
All reactions