This is Prism's Guide on how to use Tacotron 2
Things you will need
Computer
Audacity https://www.audacityteam.org/download/
Google drive account
Audio to make a voice of
First off Watch Cherry studio videos He walks you through better than I could ever do.
Cherry Studios Tutorials
Part 1
Part 2
Training Notebook
Synthesis Notebook
Tacotron2AutoTrim
Tacotron2AutoTrim is a handy tool that auto trims and auto transcription audio for using in Tacotron 2. It saves a lot of time but I would recommend double checking to make sure it gets all the sounds.
Tips
Things to prioritise for audio. Audio quality > Same room tone > length.
Audio should be more than 8 mins, for best results 40- to 2 hours.
More iterations the better.
Be sure to change It to True so the synthesis works.
Game voice lines and podcasts are perfect for it because of how clear the audio is.
P100 = Good
V100 = Amazing
T4 = Good
P4 = Bad
K80 = Slow
Tacotron2 Models
(Will add more)
Will update later and revise to be clearer and answering questions if anyone needs help. Please share your results and give feed back.
Things you will need
Computer
Audacity https://www.audacityteam.org/download/
Google drive account
Audio to make a voice of
First off Watch Cherry studio videos He walks you through better than I could ever do.
Cherry Studios Tutorials
Part 1
Part 2
Training Notebook
Google Colab
colab.research.google.com
Synthesis Notebook
Google Colab
colab.research.google.com
Tacotron2AutoTrim
GitHub - gabcodedev/Tacotron2AutoTrim: Auto trim and auto transcription of audio for using in Tacotron 2
Auto trim and auto transcription of audio for using in Tacotron 2 - GitHub - gabcodedev/Tacotron2AutoTrim: Auto trim and auto transcription of audio for using in Tacotron 2
github.com
Tacotron2AutoTrim is a handy tool that auto trims and auto transcription audio for using in Tacotron 2. It saves a lot of time but I would recommend double checking to make sure it gets all the sounds.
Tips
Things to prioritise for audio. Audio quality > Same room tone > length.
Audio should be more than 8 mins, for best results 40- to 2 hours.
More iterations the better.
Be sure to change It to True so the synthesis works.
Game voice lines and podcasts are perfect for it because of how clear the audio is.
P100 = Good
V100 = Amazing
T4 = Good
P4 = Bad
K80 = Slow
Tacotron2 Models
(Will add more)
Will update later and revise to be clearer and answering questions if anyone needs help. Please share your results and give feed back.
Last edited: