This repository is a fork of Real Time Voice Cloning (RTVC) with a synthesizer that works for the Spanish language. You can check my paper for a more detailed explanation. You can listen to the demo audios from all the Spanish models we trained (and a sample from RacoonML's trained model, too) here.
URL | Designation | Title | Implementation source |
---|---|---|---|
1806.04558 | SV2TTS | Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis | This repo |
1802.08435 | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | fatchord/WaveRNN |
1703.10135 | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | fatchord/WaveRNN |
1710.10467 | GE2E (encoder) | Generalized End-To-End Loss for Speaker Verification | This repo |
Mozilla's Common Voice Spanish dataset
Python 3.6 or 3.7 is needed to run the toolbox.
- Install PyTorch (>=1.1.0).
- Install ffmpeg.
- Run
pip install -r requirements.txt
to install the remaining necessary packages.
Download the latest here.
python demo_cli.py
If all tests pass, you're good to go.
You can then try the toolbox:
python demo_toolbox.py