• We're currently having issues with our e-mail system. Anything requiring e-mail validation (2FA, forgotten passwords, etc.) requires to be changed manually at the moment. Please reach out via the Contact Us form if you require any assistance.

Other Cryptonloid voicebank updates, collabs, & concert news (crypton_wat Twitter translations)

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618
WARNING: Twitter embeds are dumb and don't show all the pics on the preview, so make sure you click the actual tweet to see the images I reference.

Er, sorry to burst everyone's bubble, but the Cherry Pie segment was cut out of the livestream.

However, thankfully Eji went to the event and took some photos~! (Eji is someone who basically comments on every Wat tweet ever, is a big Miku fan, and goes to Crypton-related events.) (Thanks @RazzyRu for the tip.)

I'll comment on their giant tweet thread. I won't translate every word because 1) no one cares, 2) I don't have time and it's complicated and makes my brain hurt.

Tweet 1 (contains 4 pics):

I assume since this is a thread about Cherry Pie that all images are from their presentation.

Pic 1:
It's using WaveNet, a deep neural network for generating audio. Real life examples that use WaveNet are Siri, Google Assistant, Amazon Alexa, and Cortana (so WaveNet generates speech from text for those assitants to read to us out loud.)

WaveNet can come in two forms:
1) Concatenative TTS (text to speech): Uses recorded phonemes from a voice actor, so it can sound unnatural and make modifying the voice hard.
2) Parametric TTS: Uses math to recreate sounds, the information to recreate sounds is stored in a model. The characteristics of the output voice are controlled by inputs and is created using a voice synthesizer called a vocoder.
Based on type two mentioning vocoder, it's safe to assume that Cherry Pie is Parametric TTS (see Pic 4).

It says that WaveNet can hear emotion information.

Pic 2:
The thinking bubble on the left says "It's different from my voice... I feel uncomfortable..."
The one on the right says "It's close to my voice! I don't feel uncomfortable!" (the box between him says "process").

It's too blurry and complex for me and I literally don't know the English words for this, but it's basically: There are 2 types of hearing, 1 = the oscillation heard in ear bones, 2 = the oscillation heard in the air. The one that sounds weird is not heard in the bones. A filter uses both ear and bone sounds to sound good.

Pic 3:
This pic talks about the spectral envelope. I don't know enough about it, I don't wanna learn about it, and I don't wanna write about it. :} But it's related to voice color (loudness, pitch) and the WaveNet model.

Pic 4:
Links to the Vocal Drive and Cherry Pie demo we all saw in March. It says that Cherry Pie works in real time and the words associated with it are: vocal effector, VOCODER, voice analyis synthesis, DNN voice quality conversion (DNN = deep neural network)


Tweet 2 (contains 3 pics):

Pic 1 & 2:
Explains in great detail how Vocoders/Cherry Pie works (about it being real time, about the spectral envelope, about the algorithm...). It says that the synthesis has a latency of 23-46 msec and that they have a low latency mode. This also mentions the F0, which Ryo said on Twitter that it was the most important thing in the process (you basically set the pitches arbitrarily and the F0 is where it starts and confusing crud like that, but if it's wrong, it wonks up how everything sounds).

Pic 3:
This is the most interesting pic, so I'm putting it here so we can see it better:
2065
A commenter named Orahi pointed out that CV01 = Miku, CV02 R = Rin, CV02 L = Len. No sign of Luka, Meiko, or Kaito (even though nyanyannya got to use Kaito in their demo).
Another thing to note is Bel Canto #1 and #2, which is apparently Italian for "beautiful singing". Wonder what that preset does?

Lastly, it's weird that is says "Male 2 Female / Female 2 Male" instead of "Male to Female / Female to Male" (where it's "input > output" for the voices). Crypton got that rad leetspeak goin' on. So I guess "Male 2 CV01" = "Male to Miku".

I assume "Shifter+/-" moves the pitch up or down.
 

mobius017

Aspiring ∞ Creator
Apr 8, 2018
2,036
Wow, thanks for all the work you did on this!

With the reference to WaveNet, is the implication that Cherry Pie relies on WaveNet to some extent? If that's the case, I wonder how they're coping with the lag as your requests to WaveNet go across the internet. Maybe it's possible for Cherry Pie to get a local copy of some applicable bit of WaveNet's logic/output or something--like after you switch presets, say.

However it works, they must have some reasonable basis for the 23-46 msec rating for synthesis (Though I guess you could question if synthesis is all of the process, internet request included, or just part of it.). I guess performance can't be any worse than you would experience talking to Siri/Alexa/etc. And the vocal conversion wouldn't be trying to get search results/etc., like the virtual assistants do.
 

razelberii

Bless the Lord, O my Soul
Apr 8, 2018
423
雨リカ
razzyru.com
Thank you for keeping up with the information and reporting it. :kokone_lili: I've been following and lately it's perked up my interest even more so. So, I am eager to learn more about it as time goes on. It's definitely interesting and rather fresh, I believe.

A commenter named Orahi pointed out that CV01 = Miku, CV02 R = Rin, CV02 L = Len. No sign of Luka, Meiko, or Kaito (even though nyanyannya got to use Kaito in their demo).
Another thing to note is Bel Canto #1 and #2, which is apparently Italian for "beautiful singing". Wonder what that preset does?
Orahi and I did theorize that the others may be added later on. We don't have any proof on this, it's only a theory. I think it would make sense for them to at least add Luka. Maybe after launch there will be options to download or buy more presets?
I wonder what the Bel Canto preset is like too. Honestly, I want to try many. In fact, now I wonder if making a "Miku Let's Play" or "Rin/Len Let's Play" channel is much more do-able now thanks to this. I'm just curious how the Vocaloid community will handle Cherry Pie overall. I'm looking forward to it.
 
  • Like
Reactions: Wario94

Wario94

Passionate Fan
Jan 5, 2019
217
30
Don't know if you aware of it or not, but Eji did posted it on his Twitter account that Cherry Pie was already released it right now and it's 100% free!
 
  • Wow
Reactions: Jikyu

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618
@Wario94 @RogerDelmar

Unfortunately, Eji is not talking about downloading Cherry Pie for free.

He is talking about downloading a scientific paper in pdf form about the implementation of Cherry Pie from the website of the Informational Processing Society of Japan. You need to sign up for an account in order to download it, and it costs different amounts depending on if you're an IPSJ member or not). From his tweet, 自由ダウンロード = jiyuu download = freely (as in "unrestricted", NOT "no cost") download. If it was an actual free download, it would be 無料ダウンロード = muryou download = free download (download free of charge).

tl;dr Eji is pointing people to the download of a scientific paper about Cherry Pie, not announcing a download link to get Cherry Pie.
 

Wario94

Passionate Fan
Jan 5, 2019
217
30
@Wario94 @RogerDelmar

Unfortunately, Eji is not talking about downloading Cherry Pie for free.

He is talking about downloading a scientific paper in pdf form about the implementation of Cherry Pie from the website of the Informational Processing Society of Japan. You need to sign up for an account in order to download it, and it costs different amounts depending on if you're an IPSJ member or not). From his tweet, 自由ダウンロード = jiyuu download = freely (as in "unrestricted", NOT "no cost") download. If it was an actual free download, it would be 無料ダウンロード = muryou download = free download (download free of charge).

tl;dr Eji is pointing people to the download of a scientific paper about Cherry Pie, not announcing a download link to get Cherry Pie.
OOPS!😫🤣
 
  • Haha
Reactions: Jikyu

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618

I am going to plan to gradually resume twitter. Recently, various things happened, but the latest was things like meeting Yamaha's [Hideki] Kenmochi and formerly Sega's [Shuuji] Utsumi.
Hideki Kenmochi is "the father of VOCALOID". He basically helped develop the vocal synth tech since as far back and Leon and Lola (not sure if he helped with Miriam, don't really care to research that much for this tweet.) He retired in 2015.

Shuuji Utsumi used to work at Sega, but doesn't anymore. Here's the stuff he's worked on.

Based on the fact that these men are no longer with Yamaha/Sega, it seems like it won't lead to any new product or collab.

I'm hoping Wat resuming tweeting means we can see the light at the end of the tunnel with the whole Cherry Pie/Vocal Drive and Appends stuff.
 

Ceres

「会いたかった」
May 13, 2018
68
This wasn't shared here but if you remember last year's Kawasaki Jazz with Miku and Luka perfomance it is going to be held again but this time Miku's partner is going to be Rin. In there the singing and dancing wasn't pre-recorded so Cherry Pie is most definetely going to be used again if they use the same technique. (Didn't anywhere that they wouldn't but I'm not 100% sure)


Footage of that preformance didn't emerge online as far as I know, so I don't know how good it was, it would be interesting to see how much the technology has evolved since last year though... I'm looking forward to it either way!

Wat aslo retweeted the new Nintendo Switch Lite, I found that cute haha
 

mobius017

Aspiring ∞ Creator
Apr 8, 2018
2,036
(Posting a reply to the Voice Synth Sales thread here to stay on-topic.)

I wish I knew if, for when their Appends come out, if we HAD to have their previous version in order to get them to work or not. OTL
Absolutely pure speculation...but doesn't it get your hopes up a bit that the first sale ever is occurring now, given the upcoming appends? (Doesn't mean we'll need the originals or not; you could say the same thing if we thought a V5 version was on the horizon.)

I mean, you'd think that, for the first-ever sale, something unusual must be causing it.
 
  • Like
Reactions: uncreepy

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618
Yeah. I think since Crypton never does sales, it seems odd to finally have a sale. It's not even a holiday or birthday sale. So, it seems random. I wish I knew if I should spend my money or not. (There's a 50% chance we'll waste our money on this sale and regret it.)

Doesn't Crypton do a thing where you can upgrade for a discount if you purchased a bank right before they start selling a new generation? (Like 3 months grace period or something?)

RECAP:
July 9th:
Wat said he's resuming Twitter activities (aka hopefully going back to humble bragging about Cherry Pie and the Cryptonloid Appends). The last important tweet I translated from Wat was back on MARCH 5th when they announced Cherry Pie on the Labopton Blog. (What a LOOOOONG hiatus!)

July 16th: The first Cryptonloid sale in forever gets announced. I looked back on the Sonicwire blog and couldn't find any history of sales for Luka V4X, Miku V4X, or Rin/Len V4X. I saw Meiko V3 was on sale once in 2014, and Kaito V3 went on sale in 2013. The 20% off campaign saying it's the first time they've been on sale is (basically) true.

October 14th / December 23rd: At Miku Symphony, Miku and Meiko will be the hosts. Meaning Meiko's 3D model will have to be finished (I still haven't seen it, I don't think I slept on it coming out finally) and they will probably use her updated voice with Cherry Pie for the talking bits.

November 17: At Kawasaki Jazz 2019, Miku and Rin will be the hosts. So I also assume we'll hear Rin talking with her updated voice and Cherry Pie.
 
  • Like
Reactions: ngoomie

mobius017

Aspiring ∞ Creator
Apr 8, 2018
2,036
Doesn't Crypton do a thing where you can upgrade for a discount if you purchased a bank right before they start selling a new generation? (Like 3 months grace period or something?)
I think I remember reading about such a thing happening once. But under the assumption that the appends come out Aug. 1 (which obviously we don't know, but for the sake of theory/argument), I'm dubious about if they would do that this time, since they're on sale beforehand. Could happen, but it strikes me as an either/or way of handling the transition from one generation to another (Though I guess the appends wouldn't be a whole new generation, technically.).
 

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618
I was thinking about it and am dying to know what the Appends truly are.

I'm confused about:

1) Do you HAVE to own Miku V4X (for example) in order to use her mystery Append (like how you have to own Miku V2 in order to use Miku Append)?

2. When I think of "appends", it's just adding extra stuff to the stuff that's already there. The original Miku V2 Append added new voice banks in addition to her V2.
However, these new mystery Appends are not (to my knowledge) new at all, they are old voices edited to have clearer pronunciation and shorter samples.
Why would you add essentially the same thing to the same product (fake example: 2 Miku Vivid voices, one has shorter samples)?

3. If you have to own both the new and old banks, what if the Append only affects/changes the Japanese voices, because Wat literally never said they edited the English banks? (He said they were working toward "multilingualization", but never explained what that meant (whether it was extra phonemes or updating their English banks or new language banks.)

4. What if the Appends are not actually Vocaloid software and instead are exclusively some new software similar to Voidol?
We saw it being demonstrated in real time as a VST, but I realized it's strange/confusing. If you buy a voice, do you get a Vocaloid version AND a Cherry Pie version? (Because Cherry Pie uses patterns from deep learning, it wouldn't be attached to the Vocaloid software. The patterns would be in the Cherry Pie VST.)
Or are the new Appends only meant for Cherry Pie? But Wat said that you could tune by hand OR use Cherry Pie, so it seems like you have the option to tune like normal or do it by speaking/singing in real time. I'm just so confused.

I'm dying inside. This seems like cruel and unusual punishment. When we know the Appends are coming, they suddenly put the old voices on sale? When we have no information to help us decide if we're wasting our money or not? Or even when the Appends are coming?

I feel like maybe rather than saving 20%, it would be better to wait and not buy one, because if the Appends come out and it doesn't require the previous version, you'll have blown like $130 and have to buy them twice, which is more of a loss compared to buying it at regular vs 20% off. Erg...
 

mobius017

Aspiring ∞ Creator
Apr 8, 2018
2,036
1) Do you HAVE to own Miku V4X (for example) in order to use her mystery Append (like how you have to own Miku V2 in order to use Miku Append)?

2. When I think of "appends", it's just adding extra stuff to the stuff that's already there. The original Miku V2 Append added new voice banks in addition to her V2.
However, these new mystery Appends are not (to my knowledge) new at all, they are old voices edited to have clearer pronunciation and shorter samples.
Why would you add essentially the same thing to the same product (fake example: 2 Miku Vivid voices, one has shorter samples)?
I'd assume that we should look to how CFM refers to their own products for guidance here. I could easily be missing something, but the only time I'm personally aware that they used the word "append" was for things like Miku's V2 appends--even her subsequent additional voice colors in V3 (which were sometimes very similar to the original appends) weren't referred to that way. So I'd assume these might require the base VB to work.

That personally makes me scratch my head a little when I think that all the CFM-loids are getting appends, so even older Vocaloids like Kaito/Meiko are apparently getting V3 appends while everyone else is getting V4...which seems odd to me.

Regarding the samples, I assume there's some benefit to the sample size. Hard drive space is a sunk cost, but if the samples are shorter, might it not make the singing smoother when those samples are joined together? And if they're clearer, that's always a plus.
4. What if the Appends are not actually Vocaloid software and instead are exclusively some new software similar to Voidol?
We saw it being demonstrated in real time as a VST, but I realized it's strange/confusing. If you buy a voice, do you get a Vocaloid version AND a Cherry Pie version? (Because Cherry Pie uses patterns from deep learning, it wouldn't be attached to the Vocaloid software. The patterns would be in the Cherry Pie VST.)
Or are the new Appends only meant for Cherry Pie? But Wat said that you could tune by hand OR use Cherry Pie, so it seems like you have the option to tune like normal or do it by speaking/singing in real time. I'm just so confused.
Are the Cherry Pie patterns actually the voices, though? Or could they be more like alterations to Vocaloid parameter patterns over time? For instance, maybe it could listen to a person and come up with something like:

1 sec: +2 BRE, +3 DYN, -1 GEN
2 sec: -3 CLE, -2 DYN, +2 GEN

The above parameter changes being relative to the person's basic voice--which you would need to set it up with some kind of baseline for, I'd think, as a reference.

(It seems to me like even just watching what a person's voice was doing, translating that into a record of different pitches/etc., and mapping that somehow to the Vocaloid parameters would be a reasonable application for AI/deep learning. Just like it's currently being used to identify people's faces/stop signs/etc. in pictures, or the words people say in recorded audio. I could imagine CFM having something like a library of audio samples. Some people had to tune a voice to match each sample using traditional means, and they then fed the samples and VSQXs to an AI so it could learn to do what the human tuners did.)

If Cherry Pie could do that, it could lay that pattern on top of the track regardless of the actual VB being used. You could then use the changes it made, or tweak them yourself manually the usual way.

Of course, this is all me just doing some blue sky theory :miku2_move:.
 

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618
Regarding the samples, I assume there's some benefit to the sample size. Hard drive space is a sunk cost, but if the samples are shorter, might it not make the singing smoother when those samples are joined together? And if they're clearer, that's always a plus.
I could be wrong, but I think the shorter samples was because shorter samples = less engine noise (I THINK it's because the sample loops more than once depending on how long the note is. They originally went with longer samples for V4 because they were inspired by the UTAU named Gahata Meiji, but it didn't turn out very well. But for the Appends, they are using extremely short UTAU-style samples.
The decision isn't related to computer space, I think, Wat never mentioned it as being a reason. Maybe the shorter samples also work better with Cherry Pie?
==============
Okay, I legit have a tinfoil hat theory on what exactly is going on. I've been brooding over it since yesterday. (First world problem, I know. But it's related to my translation efforts spanning about 2 years, my money, and my remaining sanity.) I am going to attempt to convince myself and other people that Cherry Pie isn't related to Vocaloid. Is it true? I don't know.

Reminders:
Link to Cherry Pie demo
Link to nyanyannya's tuned demo
Link to nyanyannya's untuned demo

I went back and payed close attention to the Vocal Drive/Cherry Pie demo to break down exactly what we are looking at.

Vocal Drive:
2102


About Cherry Pie and its pitch control and spectral control:
2096

^ Doesn't editing the spectral envelope dials sound like Vocal Drive? What if Vocal Drive is just the easy way for normies to make voice growl texture?


Cherry Pie's voice conversion:
2097


I realized the interface has updated a bit since the March demo:
2098


Thoughts on "tuned" vs "untuned" based on nyanyannya's demos:
2100


So, the stuff I'm confused about is:
If you buy a new Append voice, is there a folder that has an installer for... Vocaloid AND the new VSTs (Cherry Pie and Vocal Drive and the other mystery ones)???

Or is it like the VSTs built in to Vocaloid 5?
2101


You can put the Vocaloid 5 VST in any DAW. You can clearly put Cherry Pie and Vocal Drive in to any DAW, too. Does that mean you can edit a Vocaloid's track with Cherry Pie?

What my thinking was is that Cherry Pie is unconnected to Vocaloid, like how Yoshida (Voiceroid) and Zunko (Vocaloid and Voiceroid) are also in Voidol/R.C. Voice (for real-time voice conversion unrelated to Vocaloid).

I don't know what to think based on what I wrote.
 

Prism

Enthusiast
Jul 18, 2019
525
Hey made an account so I can join the conversation. So my theory is that it coats one vocal track with another track think how zynaptic morph works. It seems pretty divorced from vocaloid and only Yamaha vocaloid's vsts can be used in the editor. So it's all daw based. I wonder if any vocals can be coated with any vocal or if it's strictly the crypton vocaloid gang.
 

mobius017

Aspiring ∞ Creator
Apr 8, 2018
2,036
3. If you have to own both the new and old banks, what if the Append only affects/changes the Japanese voices, because Wat literally never said they edited the English banks? (He said they were working toward "multilingualization", but never explained what that meant (whether it was extra phonemes or updating their English banks or new language banks.)
Something else I noticed: The English-specific VBs aren't on sale on Sonicwire. So Miku V4X and the Miku V4X bundle are on sale, and Luka/Kaito/Meiko, who have English included intrinsically are as well, but Miku English and Rin/Len English aren't on sale.

I'll be sad if that means the upcoming appends aren't including English at all....

I could be wrong, but I think the shorter samples was because shorter samples = less engine noise (I THINK it's because the sample loops more than once depending on how long the note is.
Oh, that's interesting, just because I think it's interesting to know how things work. So sample length might influence the amount of engine noise.... Maybe because of something the software has to do to merge the sounds together--you'd think that longer samples would be better, since you "stitch" less often, if that's how it works. But if it's, like, merging them somehow, then it might be better to work with shorter files.

From another perspective, less overlap/less waste just makes sense as being better for processing somehow, anyway.

Lots and lots of interesting stuff in the screenshots/comments you made in your post! ...Not sure what to think, either. I always assumed the two VSTs would be usable with Vocaloid tracks (and that is what seems to be implied by the comments they made about being able to tune either manually or automatically...isn't it?). I'd think Vocal Drive, at least, is usable with Vocaloid tracks, even if Cherry Pie is aimed more at real-time Vtubers.

One question, though:
If you buy a new Append voice, is there a folder that has an installer for... Vocaloid AND the new VSTs (Cherry Pie and Vocal Drive and the other mystery ones)???
Why do we think that the new VSTs would be included with the appends? Did they say at some point that those projects were connected? They're going on at the same time, but could they be separate?
 

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618
Vocal coating theories:
So my theory is that it coats one vocal track with another track think how zynaptic morph works. It seems pretty divorced from vocaloid and only Yamaha vocaloid's vsts can be used in the editor. So it's all daw based. I wonder if any vocals can be coated with any vocal or if it's strictly the crypton vocaloid gang.
I haven't heard of Zynaptiq's Morph program before, it was interesting when I looked it up. The technique for Cherry Pie is called a Vocoder (other programs use this term), but Wat calls it "vocal coating".

I am also wondering about if it's Cryptonloids only or not. Probably?


English included in update?
Something else I noticed: The English-specific VBs aren't on sale on Sonicwire. So Miku V4X and the Miku V4X bundle are on sale, and Luka/Kaito/Meiko, who have English included intrinsically are as well, but Miku English and Rin/Len English aren't on sale.

I'll be sad if that means the upcoming appends aren't including English at all....

One question, though:

Why do we think that the new VSTs would be included with the appends? Did they say at some point that those projects were connected? They're going on at the same time, but could they be separate?
I didn't notice about the English banks not being on sale. Technically, Meiko and Kaito's are included with their V3, though. And the V4X bundles DO include English. It's just that the English alone isn't on sale.

I am just assuming that the VSTs come with the Appends, because a long time ago, Wat said that these "plug-ins" would be free when he was speculating. I also thought that since Cryptonloids come with Piapro AND Studio One, that they seem very generous with that they provide for creators. However, I wouldn't be upset if they were sold separately.


New tweets:

Speedy progress for the tweaked voice banks:
These last several months, we formularized a new equation*, and are progressing speedily with reconstructing/refining all of the the databases** that were released in the past. When we had Mitchie M listen to sound samples, we were praised, YO!***
*"formularize an equation" means to make a an equation formulaic/predictable. In my last post, I said that Cherry Pie (the real-time voice conversion VST) used HMM (Hidden Markov Model), it uses math and being periodic (observable) or aperiodic (unnobservable) influenced the equations to calculate things like pitch used by the deep learning patterns.
**In Japanese, database/DB is what they call voice banks.
***I don't know if this is supposed to be English "yo" or Japanese よ/yo (means you're authoritively emphasizing something like a verbal explanation mark), so I just left it alone.

Someone in the comments was speculating they're trying to get the voices ready for Magical Mirai (end of August, basically). I doubt it~!


Wat doing overtime and muttering about plans:
..."We will change the way of thinking regarding (synthesized) voices! We won't match time like we have done so far. We will make the time match with the moment of tone color!"
"Harmonic overtones aren't lined up vertically! They stretch horizontally!!"
I'm muttering endless, troublesome things like this to myself in an empty office....
^ Edit: Don't think about this too hard, it's Wat jargon about software plans.
Wat tweeting on a Sunday again.
 

Users Who Are Viewing This Thread (Users: 0, Guests: 2)