Tacotron 2 For Persian language

Table of contents

Tacotron 2

Tacotron 2

Pytorch implementation of DeepMind's Tacotron-2 : Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions

Folder Structure

└───tacotron2
    ├───content
    │   └───tacotron2
    │       └───filelists
    ├───filelists
    ├───outdir
    │   └───logdir
    ├───text
    │   ├───data_prepare
    │   └───__pycache__
    └───waveglow

Setup

Step (0): Get your dataset; for persain lauguge the only open source dataset is Mozilla common voice.
Step (0.1):note you can use our own dataset too here is kaggle link
Step (1): add your own test and train data parameters in filelists/. because mozilla audio is more than 211 h of audio we procced only small portion of it, convert to wave and remove files more than 10 seconds in length, you can see them in filelists.
Step (2): Install python requirements or build docker image
- Install python requirements: pip install -r requirements.txt
Step (3): Install cuda and pytorch 1.0 .
Step (4): Train the model using this command.

python train.py --output_directory='/content/tts-engine/gdrive/My Drive/outdir' --log_directory='/content/tts-engine/gdrive/My Drive/logdir'

Step (5): Synthesize audio using tts-engine/tacotron2/inference.ipynb.

Audio samples

I listed some of audio the model genarated you can listen them in soundcloud.

Model

The model described by the authors can be divided in two parts: - Spectrogram prediction network - Wavenet vocoder