Skip to content
/ FreeVC Public

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

License

Notifications You must be signed in to change notification settings

OlaWod/FreeVC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

60032a7 Â· Jan 19, 2025

History

21 Commits
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Jan 20, 2023
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Jan 19, 2025
Nov 27, 2022
Jan 16, 2023
Nov 27, 2022
Jan 10, 2023
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Nov 27, 2022
Jan 19, 2023
Feb 13, 2023
Nov 27, 2022
Dec 6, 2022
Jan 11, 2023
Jan 16, 2023

Repository files navigation

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

arXiv githubio GitHub Repo stars GitHub

In this paper, we adopt the end-to-end framework of VITS for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation. We disentangle content information by imposing an information bottleneck to WavLM features, and propose the spectrogram-resize based data augmentation to improve the purity of extracted content information.

🤗 Play online at HuggingFace Spaces.

Visit our demo page for audio samples.

We also provide the pretrained models.

training inference
(a) Training (b) Inference

Updates

  • Code release. (Nov 27, 2022)
  • Online demo at HuggingFace Spaces. (Dec 14, 2022)
  • Supports 24kHz outputs. See here for details. (Dec 15, 2022)
  • Fix data loading bug. (Jan 10, 2023)

Pre-requisites

  1. Clone this repo: git clone https://github.com/OlaWod/FreeVC.git

  2. CD into this repo: cd FreeVC

  3. Install python requirements: pip install -r requirements.txt

  4. Download WavLM-Large and put it under directory 'wavlm/'

  5. Download the VCTK dataset (for training only)

  6. Download HiFi-GAN model and put it under directory 'hifigan/' (for training with SR only)

Inference Example

Download the pretrained checkpoints and run:

# inference with FreeVC
CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc.json --ptfile checkpoints/freevc.pth --txtpath convert.txt --outdir outputs/freevc

# inference with FreeVC-s
CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc-s.json --ptfile checkpoints/freevc-s.pth --txtpath convert.txt --outdir outputs/freevc-s

Training Example

  1. Preprocess
python downsample.py --in_dir </path/to/VCTK/wavs>
ln -s dataset/vctk-16k DUMMY

# run this if you want a different train-val-test split
python preprocess_flist.py

# run this if you want to use pretrained speaker encoder
CUDA_VISIBLE_DEVICES=0 python preprocess_spk.py

# run this if you want to train without SR-based augmentation
CUDA_VISIBLE_DEVICES=0 python preprocess_ssl.py

# run these if you want to train with SR-based augmentation
CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 68 --max 72
CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 73 --max 76
CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 77 --max 80
CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 81 --max 84
CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 85 --max 88
CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 89 --max 92
  1. Train
# train freevc
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/freevc.json -m freevc

# train freevc-s
CUDA_VISIBLE_DEVICES=2 python train.py -c configs/freevc-s.json -m freevc-s

References