AITalk getting deep-learning based engine update April 2020


Apr 9, 2018
Link to the PDF:
From AI Talk's news section (September 13th): 株式会社AI(エーアイ)

(Thanks to Fumito Fumizuki for pointing this news out in the Discord.)

I translated the important parts:

AITalk is currently at version 4 (AITalk®4). The temporary name for the new engine is AITalk®5.
Because the vocal synth scene changed from mainly being used for robo calls and learning videos to now being used for smart products with interactive capabilities, they got a grant during July 2017 to December 2018 to develop a deep learning synthesis engine to solve issues with the old engine.

AITalk 4 uses "corpus-based speech synthesis": To meet the needs for interactivity, the voices require emotions like happiness, sadness, anger. Corpus-based speech synthesis needs both a phoneme dictionary and rhythm dictionary so it can figure out the accent. But it cost a lot and transitioning between emotions is not smooth.

The next gen AITalk 5 uses deep neural network speech synthesis. The sound quality will go up, the pronunciation will sound more natural/human, and it will naturally switch between emotions such as joy, sadness, anger. Instead of having an emotion dictionary like corpus-based speech synthesis had, the deep-learning one will have less files to record and therefore be less expensive.

The new engine will be available April 2020.

For those who don't know, products such as GynoidTalk and VOICEROID use AITalk.


New Fan
Jul 18, 2019
I hope it sounds like the new cevio demo. Google's wavenet (ai tts) has some engine noise and banks are inconsistent. With Amazon's polly (has both ai and non ai tts) I actually prefer the non ai most of the time it just sounds clearer. Excited to hear about a demo
