Cryptonloid voicebank updates, collabs, & concert news (crypton_wat Twitter translations)

uncreepy

Hobbyist translator
Apr 9, 2018
450
USA
WARNING: Twitter embeds are dumb and don't show all the pics on the preview, so make sure you click the actual tweet to see the images I reference.

Er, sorry to burst everyone's bubble, but the Cherry Pie segment was cut out of the livestream.

However, thankfully Eji went to the event and took some photos~! (Eji is someone who basically comments on every Wat tweet ever, is a big Miku fan, and goes to Crypton-related events.) (Thanks @RazzyRu for the tip.)

I'll comment on their giant tweet thread. I won't translate every word because 1) no one cares, 2) I don't have time and it's complicated and makes my brain hurt.

Tweet 1 (contains 4 pics):

I assume since this is a thread about Cherry Pie that all images are from their presentation.

Pic 1:
It's using WaveNet, a deep neural network for generating audio. Real life examples that use WaveNet are Siri, Google Assistant, Amazon Alexa, and Cortana (so WaveNet generates speech from text for those assitants to read to us out loud.)

WaveNet can come in two forms:
1) Concatenative TTS (text to speech): Uses recorded phonemes from a voice actor, so it can sound unnatural and make modifying the voice hard.
2) Parametric TTS: Uses math to recreate sounds, the information to recreate sounds is stored in a model. The characteristics of the output voice are controlled by inputs and is created using a voice synthesizer called a vocoder.
Based on type two mentioning vocoder, it's safe to assume that Cherry Pie is Parametric TTS (see Pic 4).

It says that WaveNet can hear emotion information.

Pic 2:
The thinking bubble on the left says "It's different from my voice... I feel uncomfortable..."
The one on the right says "It's close to my voice! I don't feel uncomfortable!" (the box between him says "process").

It's too blurry and complex for me and I literally don't know the English words for this, but it's basically: There are 2 types of hearing, 1 = the oscillation heard in ear bones, 2 = the oscillation heard in the air. The one that sounds weird is not heard in the bones. A filter uses both ear and bone sounds to sound good.

Pic 3:
This pic talks about the spectral envelope. I don't know enough about it, I don't wanna learn about it, and I don't wanna write about it. :} But it's related to voice color (loudness, pitch) and the WaveNet model.

Pic 4:
Links to the Vocal Drive and Cherry Pie demo we all saw in March. It says that Cherry Pie works in real time and the words associated with it are: vocal effector, VOCODER, voice analyis synthesis, DNN voice quality conversion (DNN = deep neural network)


Tweet 2 (contains 3 pics):

Pic 1 & 2:
Explains in great detail how Vocoders/Cherry Pie works (about it being real time, about the spectral envelope, about the algorithm...). It says that the synthesis has a latency of 23-46 msec and that they have a low latency mode. This also mentions the F0, which Ryo said on Twitter that it was the most important thing in the process (you basically set the pitches arbitrarily and the F0 is where it starts and confusing crud like that, but if it's wrong, it wonks up how everything sounds).

Pic 3:
This is the most interesting pic, so I'm putting it here so we can see it better:
2065
A commenter named Orahi pointed out that CV01 = Miku, CV02 R = Rin, CV02 L = Len. No sign of Luka, Meiko, or Kaito (even though nyanyannya got to use Kaito in their demo).
Another thing to note is Bel Canto #1 and #2, which is apparently Italian for "beautiful singing". Wonder what that preset does?

Lastly, it's weird that is says "Male 2 Female / Female 2 Male" instead of "Male to Female / Female to Male" (where it's "input > output" for the voices). Crypton got that rad leetspeak goin' on. So I guess "Male 2 CV01" = "Male to Miku".

I assume "Shifter+/-" moves the pitch up or down.
 

mobius017

Aspiring ∞ Creator
Apr 8, 2018
570
Wow, thanks for all the work you did on this!

With the reference to WaveNet, is the implication that Cherry Pie relies on WaveNet to some extent? If that's the case, I wonder how they're coping with the lag as your requests to WaveNet go across the internet. Maybe it's possible for Cherry Pie to get a local copy of some applicable bit of WaveNet's logic/output or something--like after you switch presets, say.

However it works, they must have some reasonable basis for the 23-46 msec rating for synthesis (Though I guess you could question if synthesis is all of the process, internet request included, or just part of it.). I guess performance can't be any worse than you would experience talking to Siri/Alexa/etc. And the vocal conversion wouldn't be trying to get search results/etc., like the virtual assistants do.
 

RazzyRu

Designer
Staff member
Administrator
Apr 8, 2018
262
razzyru.com
Thank you for keeping up with the information and reporting it. :kokone_lili: I've been following and lately it's perked up my interest even more so. So, I am eager to learn more about it as time goes on. It's definitely interesting and rather fresh, I believe.

A commenter named Orahi pointed out that CV01 = Miku, CV02 R = Rin, CV02 L = Len. No sign of Luka, Meiko, or Kaito (even though nyanyannya got to use Kaito in their demo).
Another thing to note is Bel Canto #1 and #2, which is apparently Italian for "beautiful singing". Wonder what that preset does?
Orahi and I did theorize that the others may be added later on. We don't have any proof on this, it's only a theory. I think it would make sense for them to at least add Luka. Maybe after launch there will be options to download or buy more presets?
I wonder what the Bel Canto preset is like too. Honestly, I want to try many. In fact, now I wonder if making a "Miku Let's Play" or "Rin/Len Let's Play" channel is much more do-able now thanks to this. I'm just curious how the Vocaloid community will handle Cherry Pie overall. I'm looking forward to it.
 
  • Like
Reactions: Wario94

Wario94

Aspiring Fan
Jan 5, 2019
38
24
Don't know if you aware of it or not, but Eji did posted it on his Twitter account that Cherry Pie was already released it right now and it's 100% free!
 
  • Wow
Reactions: RogerDelmar

uncreepy

Hobbyist translator
Apr 9, 2018
450
USA
@Wario94 @RogerDelmar

Unfortunately, Eji is not talking about downloading Cherry Pie for free.

He is talking about downloading a scientific paper in pdf form about the implementation of Cherry Pie from the website of the Informational Processing Society of Japan. You need to sign up for an account in order to download it, and it costs different amounts depending on if you're an IPSJ member or not). From his tweet, 自由ダウンロード = jiyuu download = freely (as in "unrestricted", NOT "no cost") download. If it was an actual free download, it would be 無料ダウンロード = muryou download = free download (download free of charge).

tl;dr Eji is pointing people to the download of a scientific paper about Cherry Pie, not announcing a download link to get Cherry Pie.
 

Wario94

Aspiring Fan
Jan 5, 2019
38
24
@Wario94 @RogerDelmar

Unfortunately, Eji is not talking about downloading Cherry Pie for free.

He is talking about downloading a scientific paper in pdf form about the implementation of Cherry Pie from the website of the Informational Processing Society of Japan. You need to sign up for an account in order to download it, and it costs different amounts depending on if you're an IPSJ member or not). From his tweet, 自由ダウンロード = jiyuu download = freely (as in "unrestricted", NOT "no cost") download. If it was an actual free download, it would be 無料ダウンロード = muryou download = free download (download free of charge).

tl;dr Eji is pointing people to the download of a scientific paper about Cherry Pie, not announcing a download link to get Cherry Pie.
OOPS!😫🤣
 
  • Haha
Reactions: RogerDelmar

uncreepy

Hobbyist translator
Apr 9, 2018
450
USA

I am going to plan to gradually resume twitter. Recently, various things happened, but the latest was things like meeting Yamaha's [Hideki] Kenmochi and formerly Sega's [Shuuji] Utsumi.
Hideki Kenmochi is "the father of VOCALOID". He basically helped develop the vocal synth tech since as far back and Leon and Lola (not sure if he helped with Miriam, don't really care to research that much for this tweet.) He retired in 2015.

Shuuji Utsumi used to work at Sega, but doesn't anymore. Here's the stuff he's worked on.

Based on the fact that these men are no longer with Yamaha/Sega, it seems like it won't lead to any new product or collab.

I'm hoping Wat resuming tweeting means we can see the light at the end of the tunnel with the whole Cherry Pie/Vocal Drive and Appends stuff.
 

Aruku

「会いたかった」
May 13, 2018
28
This wasn't shared here but if you remember last year's Kawasaki Jazz with Miku and Luka perfomance it is going to be held again but this time Miku's partner is going to be Rin. In there the singing and dancing wasn't pre-recorded so Cherry Pie is most definetely going to be used again if they use the same technique. (Didn't anywhere that they wouldn't but I'm not 100% sure)


Footage of that preformance didn't emerge online as far as I know, so I don't know how good it was, it would be interesting to see how much the technology has evolved since last year though... I'm looking forward to it either way!

Wat aslo retweeted the new Nintendo Switch Lite, I found that cute haha
 

mobius017

Aspiring ∞ Creator
Apr 8, 2018
570
(Posting a reply to the Voice Synth Sales thread here to stay on-topic.)

I wish I knew if, for when their Appends come out, if we HAD to have their previous version in order to get them to work or not. OTL
Absolutely pure speculation...but doesn't it get your hopes up a bit that the first sale ever is occurring now, given the upcoming appends? (Doesn't mean we'll need the originals or not; you could say the same thing if we thought a V5 version was on the horizon.)

I mean, you'd think that, for the first-ever sale, something unusual must be causing it.
 
  • Like
Reactions: uncreepy

uncreepy

Hobbyist translator
Apr 9, 2018
450
USA
Yeah. I think since Crypton never does sales, it seems odd to finally have a sale. It's not even a holiday or birthday sale. So, it seems random. I wish I knew if I should spend my money or not. (There's a 50% chance we'll waste our money on this sale and regret it.)

Doesn't Crypton do a thing where you can upgrade for a discount if you purchased a bank right before they start selling a new generation? (Like 3 months grace period or something?)

RECAP:
July 9th:
Wat said he's resuming Twitter activities (aka hopefully going back to humble bragging about Cherry Pie and the Cryptonloid Appends). The last important tweet I translated from Wat was back on MARCH 5th when they announced Cherry Pie on the Labopton Blog. (What a LOOOOONG hiatus!)

July 16th: The first Cryptonloid sale in forever gets announced. I looked back on the Sonicwire blog and couldn't find any history of sales for Luka V4X, Miku V4X, or Rin/Len V4X. I saw Meiko V3 was on sale once in 2014, and Kaito V3 went on sale in 2013. The 20% off campaign saying it's the first time they've been on sale is (basically) true.

October 14th / December 23rd: At Miku Symphony, Miku and Meiko will be the hosts. Meaning Meiko's 3D model will have to be finished (I still haven't seen it, I don't think I slept on it coming out finally) and they will probably use her updated voice with Cherry Pie for the talking bits.

November 17: At Kawasaki Jazz 2019, Miku and Rin will be the hosts. So I also assume we'll hear Rin talking with her updated voice and Cherry Pie.
 

mobius017

Aspiring ∞ Creator
Apr 8, 2018
570
Doesn't Crypton do a thing where you can upgrade for a discount if you purchased a bank right before they start selling a new generation? (Like 3 months grace period or something?)
I think I remember reading about such a thing happening once. But under the assumption that the appends come out Aug. 1 (which obviously we don't know, but for the sake of theory/argument), I'm dubious about if they would do that this time, since they're on sale beforehand. Could happen, but it strikes me as an either/or way of handling the transition from one generation to another (Though I guess the appends wouldn't be a whole new generation, technically.).
 

uncreepy

Hobbyist translator
Apr 9, 2018
450
USA
I was thinking about it and am dying to know what the Appends truly are.

I'm confused about:

1) Do you HAVE to own Miku V4X (for example) in order to use her mystery Append (like how you have to own Miku V2 in order to use Miku Append)?

2. When I think of "appends", it's just adding extra stuff to the stuff that's already there. The original Miku V2 Append added new voice banks in addition to her V2.
However, these new mystery Appends are not (to my knowledge) new at all, they are old voices edited to have clearer pronunciation and shorter samples.
Why would you add essentially the same thing to the same product (fake example: 2 Miku Vivid voices, one has shorter samples)?

3. If you have to own both the new and old banks, what if the Append only affects/changes the Japanese voices, because Wat literally never said they edited the English banks? (He said they were working toward "multilingualization", but never explained what that meant (whether it was extra phonemes or updating their English banks or new language banks.)

4. What if the Appends are not actually Vocaloid software and instead are exclusively some new software similar to Voidol?
We saw it being demonstrated in real time as a VST, but I realized it's strange/confusing. If you buy a voice, do you get a Vocaloid version AND a Cherry Pie version? (Because Cherry Pie uses patterns from deep learning, it wouldn't be attached to the Vocaloid software. The patterns would be in the Cherry Pie VST.)
Or are the new Appends only meant for Cherry Pie? But Wat said that you could tune by hand OR use Cherry Pie, so it seems like you have the option to tune like normal or do it by speaking/singing in real time. I'm just so confused.

I'm dying inside. This seems like cruel and unusual punishment. When we know the Appends are coming, they suddenly put the old voices on sale? When we have no information to help us decide if we're wasting our money or not? Or even when the Appends are coming?

I feel like maybe rather than saving 20%, it would be better to wait and not buy one, because if the Appends come out and it doesn't require the previous version, you'll have blown like $130 and have to buy them twice, which is more of a loss compared to buying it at regular vs 20% off. Erg...
 

mobius017

Aspiring ∞ Creator
Apr 8, 2018
570
1) Do you HAVE to own Miku V4X (for example) in order to use her mystery Append (like how you have to own Miku V2 in order to use Miku Append)?

2. When I think of "appends", it's just adding extra stuff to the stuff that's already there. The original Miku V2 Append added new voice banks in addition to her V2.
However, these new mystery Appends are not (to my knowledge) new at all, they are old voices edited to have clearer pronunciation and shorter samples.
Why would you add essentially the same thing to the same product (fake example: 2 Miku Vivid voices, one has shorter samples)?
I'd assume that we should look to how CFM refers to their own products for guidance here. I could easily be missing something, but the only time I'm personally aware that they used the word "append" was for things like Miku's V2 appends--even her subsequent additional voice colors in V3 (which were sometimes very similar to the original appends) weren't referred to that way. So I'd assume these might require the base VB to work.

That personally makes me scratch my head a little when I think that all the CFM-loids are getting appends, so even older Vocaloids like Kaito/Meiko are apparently getting V3 appends while everyone else is getting V4...which seems odd to me.

Regarding the samples, I assume there's some benefit to the sample size. Hard drive space is a sunk cost, but if the samples are shorter, might it not make the singing smoother when those samples are joined together? And if they're clearer, that's always a plus.
4. What if the Appends are not actually Vocaloid software and instead are exclusively some new software similar to Voidol?
We saw it being demonstrated in real time as a VST, but I realized it's strange/confusing. If you buy a voice, do you get a Vocaloid version AND a Cherry Pie version? (Because Cherry Pie uses patterns from deep learning, it wouldn't be attached to the Vocaloid software. The patterns would be in the Cherry Pie VST.)
Or are the new Appends only meant for Cherry Pie? But Wat said that you could tune by hand OR use Cherry Pie, so it seems like you have the option to tune like normal or do it by speaking/singing in real time. I'm just so confused.
Are the Cherry Pie patterns actually the voices, though? Or could they be more like alterations to Vocaloid parameter patterns over time? For instance, maybe it could listen to a person and come up with something like:

1 sec: +2 BRE, +3 DYN, -1 GEN
2 sec: -3 CLE, -2 DYN, +2 GEN

The above parameter changes being relative to the person's basic voice--which you would need to set it up with some kind of baseline for, I'd think, as a reference.

(It seems to me like even just watching what a person's voice was doing, translating that into a record of different pitches/etc., and mapping that somehow to the Vocaloid parameters would be a reasonable application for AI/deep learning. Just like it's currently being used to identify people's faces/stop signs/etc. in pictures, or the words people say in recorded audio. I could imagine CFM having something like a library of audio samples. Some people had to tune a voice to match each sample using traditional means, and they then fed the samples and VSQXs to an AI so it could learn to do what the human tuners did.)

If Cherry Pie could do that, it could lay that pattern on top of the track regardless of the actual VB being used. You could then use the changes it made, or tweak them yourself manually the usual way.

Of course, this is all me just doing some blue sky theory :miku2_move:.
 

uncreepy

Hobbyist translator
Apr 9, 2018
450
USA
Regarding the samples, I assume there's some benefit to the sample size. Hard drive space is a sunk cost, but if the samples are shorter, might it not make the singing smoother when those samples are joined together? And if they're clearer, that's always a plus.
I could be wrong, but I think the shorter samples was because shorter samples = less engine noise (I THINK it's because the sample loops more than once depending on how long the note is. They originally went with longer samples for V4 because they were inspired by the UTAU named Gahata Meiji, but it didn't turn out very well. But for the Appends, they are using extremely short UTAU-style samples.
The decision isn't related to computer space, I think, Wat never mentioned it as being a reason. Maybe the shorter samples also work better with Cherry Pie?
==============
Okay, I legit have a tinfoil hat theory on what exactly is going on. I've been brooding over it since yesterday. (First world problem, I know. But it's related to my translation efforts spanning about 2 years, my money, and my remaining sanity.) I am going to attempt to convince myself and other people that Cherry Pie isn't related to Vocaloid. Is it true? I don't know.

Reminders:
Link to Cherry Pie demo
Link to nyanyannya's tuned demo
Link to nyanyannya's untuned demo

I went back and payed close attention to the Vocal Drive/Cherry Pie demo to break down exactly what we are looking at.

Vocal Drive:
2102


About Cherry Pie and its pitch control and spectral control:
2096

^ Doesn't editing the spectral envelope dials sound like Vocal Drive? What if Vocal Drive is just the easy way for normies to make voice growl texture?


Cherry Pie's voice conversion:
2097


I realized the interface has updated a bit since the March demo:
2098


Thoughts on "tuned" vs "untuned" based on nyanyannya's demos:
2100


So, the stuff I'm confused about is:
If you buy a new Append voice, is there a folder that has an installer for... Vocaloid AND the new VSTs (Cherry Pie and Vocal Drive and the other mystery ones)???

Or is it like the VSTs built in to Vocaloid 5?
2101


You can put the Vocaloid 5 VST in any DAW. You can clearly put Cherry Pie and Vocal Drive in to any DAW, too. Does that mean you can edit a Vocaloid's track with Cherry Pie?

What my thinking was is that Cherry Pie is unconnected to Vocaloid, like how Yoshida (Voiceroid) and Zunko (Vocaloid and Voiceroid) are also in Voidol/R.C. Voice (for real-time voice conversion unrelated to Vocaloid).

I don't know what to think based on what I wrote.
 

Users Who Are Viewing This Thread (Users: 0, Guests: 0)