- Clone the repository
git clone https://github.com/natlamir/MeloTTS-Windows.git
cd MeloTTS-Windows
- Create conda environment and install dependencies
conda env create -f environment.yml
conda activate melotts-win
pip install -e .
python -m unidic download
If you have trouble doing the download with the python -m unidic download
you can try this:
- Download the zip from: https://cotonoha-dic.s3-ap-northeast-1.amazonaws.com/unidic-3.1.0.zip
- Place it in: C:\Users\YOUR_USER_ID\miniconda3\envs\melotts-win\Lib\site-packages\unidic
- Rename it to unidic.zip
- Replace the downalod.py file in this same directory with the one from https://github.com/natlamir/ProjectFiles/blob/main/melotts/download.py
- Now re-run the
python -m unidic download
. This info originally gotten from: myshell-ai#62 (comment)
- Install pytorch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
- Run using:
melo-ui
- In the
melo/data/example
folder, delete the examplemetadata.list
file. - If you need to convert mp3 to wav, create a folder called
mp3s
in the example folder and copy all your mp3 files into themp3s
folder - With a conda window activated with the enviroment open in the
melo
folder, runConvertMp3toWav.bat
from the conda prompt. This will create a folderdata/example/wavs
with all of the converted wav files. - Create a transcript file by running
python transcript.py
which will create adata/example/metadata.list
file. - Run
python preprocess_text.py --metadata data/example/metadata.list
to create thetrain.list
,config.json
, among other files in thedata/example
folder. - Modify
config.json
to change the batch size, epochs, learning rate, etc. - From the conda prompt run
train.bat
to start the training. - File will be created within the
data/example/config
folder with the checkpoints and other logging information. - To test out a checkpoint, run:
python infer.py --text "this is a test" -m "C:\ai\MeloTTS-Windows\melo\data\example\config\G_0.pth" -o output
changing the G_0 to the checkpoint you want to test with G_1000, G2000, etc. - When you want to use a checkpoint from the UI, create a
melo/custom
folder and copy the .pth andconfig.json
file over from thedata/example/config
, rename the .pth to a user-friendly name, and launch the UI to see it in the custom voice dropdown. - To see the tensorboard, install
pip install tensorflow
- Run
tensorboard --logdir=data\example\config
- This will give you the local URL to view the tensorboard.
MeloTTS is a high-quality multi-lingual text-to-speech library by MIT and MyShell.ai. Supported languages include:
Language | Example |
---|---|
English (American) | Link |
English (British) | Link |
English (Indian) | Link |
English (Australian) | Link |
English (Default) | Link |
Spanish | Link |
French | Link |
Chinese (mix EN) | Link |
Japanese | Link |
Korean | Link |
Some other features include:
- The Chinese speaker supports
mixed Chinese and English
. - Fast enough for
CPU real-time inference
.
The Python API and model cards can be found in this repo or on HuggingFace.
Discord
Join our Discord community and select the Developer
role upon joining to gain exclusive access to our developer-only channel! Don't miss out on valuable discussions and collaboration opportunities.
Contributing
If you find this work useful, please consider contributing to this repo.
- Many thanks to @fakerybakery for adding the Web UI and CLI part.
- Wenliang Zhao at Tsinghua University
- Xumin Yu at Tsinghua University
- Zengyi Qin at MIT and MyShell
Citation
@software{zhao2024melo,
author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
url = {https://github.com/myshell-ai/MeloTTS},
year = {2023}
}
This library is under MIT License, which means it is free for both commercial and non-commercial use.
This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.