Just because it is an "AI" bank doesn't mean that you don't have to do any work. We've already seen how Saki's AI bank works when many people covered the same song and you end up getting the same output. All AI anything does is give you a base to work from, if you want to change how it sounds, you have to put in the work. You got to remember that the 'A' in 'AI' stands for artificial and not autonomous.
It's Maki for goodness sakes. AHS is not catering this to the average English joe for voiceover on his Minecraft videos. This is Maki, a Japanese voice that just so happens to also have an English one. They're buying because she's Maki. They catering to a specific niche, and there's nothing wrong with that. Eventually we might get another voice from them that is more suited for general purpose use, maybe not but the point still stands that this is the product that was made.
If you want something for native English speakers, get a native English company to make a bank for you or a company that has multilingual experience. Don't expect from a company that has been making Japanese banks for their entire existence to be "oh so amazing" with their first product. We've all seen how most English Vocaloids sound from Japanese VPs, there's no reason to expect it to be different for TTS, especially when it's AI trained. It's clearly not aimed for you.
Yeah, it seems that she learned non-rhotic English, so essentially British in essence. I think it's a nice quirk when everyone is trying to sound American these days.
I wrote an apology on Twitter for writing my humble opinion in an angry tone. I shouldn't have written it so passionately/assumed people would agree. But I have legitimate reasons Maki is not up to par for me personally.
I use TTS many hours a week, I can not read long passages without it in English or Japanese. I own CeVIO, VOICEROID, and A.I.VOICE for Japanese and use Microsoft TTS for English. When Maki got announced, I assumed she would be of quality similar to what AI Inc has to offer (on this page, scroll down for audio clips
様々な言語での音声合成を可能にします。 AITalk International® )
It was dumb to assume that I would get a certain quality, but at the same time, I REALLY wanted a TTS of a character voice that could narrate to me instead of Microsoft Mark.
When I use the CeVIO/VOICEROID/A.I.VOICE banks, I input the text and do not have to edit it much. It's just type and export, I do not want to tune speech that is supposed to sound natural by default, especially through AI. I don't have to tweak the voices most of the time in VOICEROID/A.I.VOICE and I perceive Maki as needing excessive tweaking that isn't worth the effort based on the sheer amount of TTS narration I need per week. I do not appreciate being told by people (ex: Twitter) that I am using TTS wrong by assuming I could've used it for educational narration when they think it's clearly meant only for skits or that I'm "using it wrong" when I do spend many, many hours listening to TTS and I know what I want/need from a voice bank.
Also, I legitimately thought that Maki was not 100% understandable but people got really angry at me because they said they can understand her fully. For me, there is no purpose in owning a voice bank I can not understand just by listening (because I can't pay attention/repeatedly lose my place while trying to follow along to text, I NEED to be able to understand the voice without seeing the words). She might be the perfect character voice for some people, but I was planning on having a voice to help me do my daily function of reading + use for my educational website. I got worked up because Maki isn't how I imagined she would be based on my experience with other TTS.
Lastly, I am very uncomfortable with people accusing me of disliking Japanese accents. I have been speaking in Japanese and English to bilingual Japanese users on Skype since 2008. I am used to accents and like it when people speak English with whatever ability they have. But this is a product and not a person per-say in my humble opinion. My criticisms are purely because of my perceived time to edit her voice and having some words I can not understand being said, it's about my accessibility.
We won't know her true quality until a demo page comes out and we can purchase her. And I think we need to realize that TTS has fans that are both people collecting unique/novelty voices as well as people needing TTS for actual accessibility. It's like how some English song banks are hard to use due to wonked up phonemes even if it's a native voice bank or not.