PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This fork has been instrumented with Weights & Biases to enable experiment tracking, prediction logging, dataset and model versioning, and hyperparameter optimziation.
This implementation includes uses the LJSpeech dataset.
- NVIDIA GPU + CUDA cuDNN
- Run
pip install -r requirements.txt - Run
wandb initto configure your working directory to log to Weights & Biases. - Run
python register-data.pyto create a reference Artifact pointing to the LJSpeech dataset. - Run
python split-data.pyto create a versioned train/validation split of the data. - Run
python register-model ...to log pre-trained tacotron and waveglow models as Artifacts to Weights & Biases. - Run
python train.py <dataset-artifact>to warm-start train tacotron2 on the dataset you created. - Run
python inference.py <tacotron-artifact> <waveglow-artifact> <text>to run inference on a text file containing newline delimited sentences. The inference results will be logged to Weights & Biases as awandb.Table
WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis
nv-wavenet Faster than real time WaveNet.
This implementation uses code from the following repos: Keith Ito, Prem Seetharaman as described in our code.
We are inspired by Ryuchi Yamamoto's Tacotron PyTorch implementation.
We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan Wang and Zongheng Yang.