This is an implementation of our pseudo-whispered speech conversion method in the paper Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation (pdf; to appear in ASRU 2023).
- Python 3.9
- Numpy
- soundfile
- librosa
- PyWorld
-
utils.py
This script has all the essential functions used in our proposed method.
Note: In our work, the speech files are or re-sampled to 16 kHz. So the parameter of GFM-IAIF-GC is based on this sample rate.
-
data_gen.py
This script is used to convert normal speech into pseudo-whispered speech.
-
rq2_gen.py
This script is used to convert normal speech into:
- normal speech without glottal contributions;
- normal speech with widened formant bandwidth and shifted formant frequencies.
1. Convert normal speech into pseudo-whispered speech from your dataset:
python data_gen.py --data_list './list_example(PATH TO THE LIST OF SOURCE TRAINING DATA)' \
--output_dir './data/training/wTIMIT/PW(PATH TO OUTPUT PW DIRECTORY)'
2. Convert normal speech into 1) normal speech without glottal contributions:
python rq2_gen.py --data_list './list_example(PATH TO THE LIST OF SOURCE TRAINING DATA)' \
--output_dir './data/training/wTIMIT/s1(PATH TO OUTPUT DIRECTORY)' \
--generating_mode '1'
3. Convert normal speech into 2) normal speech with widened formant bandwidth and shifted formant frequencies:
python rq2_gen.py --data_list './list_example(PATH TO THE LIST OF SOURCE TRAINING DATA)' \
--output_dir './data/training/wTIMIT/s2(PATH TO OUTPUT DIRECTORY)' \
--generating_mode '2'
Note: you can check
./list_example
to see an example of the input data list. You can get the list by using this command:find ./corpora/wTIMIT/nist/TRAIN/normal/US/(PATH TO SOURCE TRAINING DATA) -name "*.WAV" | awk '{split($0,a,"/");split(a[14],b,"."); print b[1] ,$0}' > listYou may need to check your data directory and change
a[14]
in the command.
@INPROCEEDINGS{10389801,
author={Lin, Zhaofeng and Patel, Tanvina and Scharenborg, Odette},
booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
title={Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation},
year={2023},
volume={},
number={},
pages={1-8},
keywords={Error analysis;Databases;Conferences;Training data;Transforms;Data augmentation;Acoustics;Whispered speech;pseudo-whisper;end-to-end speech recognition;wTIMIT;signal processing},
doi={10.1109/ASRU57964.2023.10389801}}
If you have any questions, feel free to open an issue or send me an email linzh (at) tcd.ie