This is a referenc version, please use TCSinger for better technique control!
- Downlowad GTSinger and process all json to
metadata.json
. - Put
metadata.json
(including ph, word, item_name, ph_durs, tech,mix_tech, falsetto_tech, breathy_tech, pharyngeal_tech, vibrato_tech, glissando_tech, wav_fn, singer, ep_pitches, ep_notedurs, ep_types for each singing voice),spker_set.json
(including all singers and their id), andphone_set.json
(all phonemes of your dictionary) indata/processed/style
- Set
processed_data_dir
,binary_data_dir
,valid_prefixes
,test_prefixes
in the config. - Preprocess Dataset
export PYTHONPATH=.
CUDA_VISIBLE_DEVICES=$GPU python data_gen/tts/bin/binarize.py --config egs/stylesinger.yaml
CUDA_VISIBLE_DEVICES=$GPU python tasks/run.py --config egs/stylesinger.yaml --exp_name StyleSinger --reset
CUDA_VISIBLE_DEVICES=$GPU python tasks/run.py --config egs/stylesinger.yaml --exp_name StyleSinger --infer
This implementation uses parts of the code from the following Github repos: StyleSinger as described in our code.