Skip to content

Latest commit

 

History

History
36 lines (25 loc) · 2.12 KB

README.md

File metadata and controls

36 lines (25 loc) · 2.12 KB

Real-Time Voice Cloning in Spanish

This repository is a fork of Real Time Voice Cloning (RTVC) with a synthesizer that works for the Spanish language. You can check my paper for a more detailed explanation. You can listen to the demo audios from all the Spanish models we trained (and a sample from RacoonML's trained model, too) here.

Papers implemented (by CorentinJ)

URL Designation Title Implementation source
1806.04558 SV2TTS Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis This repo
1802.08435 WaveRNN (vocoder) Efficient Neural Audio Synthesis fatchord/WaveRNN
1703.10135 Tacotron (synthesizer) Tacotron: Towards End-to-End Speech Synthesis fatchord/WaveRNN
1710.10467 GE2E (encoder) Generalized End-To-End Loss for Speaker Verification This repo

Dataset used

Mozilla's Common Voice Spanish dataset

Setup

1. Install Requirements

Python 3.6 or 3.7 is needed to run the toolbox.

  • Install PyTorch (>=1.1.0).
  • Install ffmpeg.
  • Run pip install -r requirements.txt to install the remaining necessary packages.

2. Download Pretrained Models

Download the latest here.

3. Try the demo CLI

python demo_cli.py

If all tests pass, you're good to go.

4. Launch the Toolbox

You can then try the toolbox: python demo_toolbox.py