• Our site rules have went through some updates and changes recently! Please take a moment to read over the changes (Rules 2-4) here!

Deep-learning-based CeVIO further in development

uncreepy

🎃
Apr 9, 2018
1,281
USA
tl;dr CeVIO's updated engine that uses deep learning has DAW integration, balances quality/render time better than before (went from taking 10 hours for a 5 min song to now running on a laptop), will probably be called "NeoCeVIO", and the vocals appear to be getting Neo names (ex: "Neo Sasara").

Note:
For those of you out of the loop, check out the English/Japanese/Chinese demos from December 2018 here: Reproducing high-quality singing voice
For the most recent demo using English/Japanese, check out this full length song ("Itsuka Kanarazu"): New singing synthesis demo from CeVIO developer Techno-Speech



On October 9th, Techno-Speech teased the upcoming deep-learning-based "NeoCeVIO" (temporary name) at Meiji Kinenkan (it's a historic venue especially used for parties/weddings). It was free to the public and contained posters and a software demonstration.
Kazuhiro Nakamura is a researcher at Techno-Speech who conducted the booth.


On the page linked, ‘–±ÈbuICTƒCƒmƒx[ƒVƒ‡ƒ“ƒtƒH[ƒ‰ƒ€2019v‚ÌŠJÃ
it explains that the event was part of "ICT Innocation Forum 2019" to show off technology research and development in the sphere of telecommunications hosted by the Ministry of Internal Affairs and Communications.



(Eji is a person who collects a lot of information from Miku/Crypton-related events.)
Eji says that other potential names for the new CeVIO are: CeVIO AI (Techno-Speech apparently used this name before), CeVIO Pro (a user called PSGOZ heard this name), and NeoCeVIO (kM4osM (pronounced "kurosu") heard this name at the event). (I'm going to call it NeoCeVIO, because that's what was heard during this event a few days ago.)



The goal of the update is so the voices can have diverse expression. The other things noted by Eji are too advanced in my understanding on vocal synthesis to rephrase.



(Chiteico is a person related to the Synth V sphere and frequently talks to Amano Kei about it.)
Chiteico thinks that while VOCALOID:AI is aimed at pros, NeoCeVIO is aimed at "DTMers" (the Japanese term for Desktop Musicians who make MIDI music)/Vocaloid producers.



KM4osM says that NeoCeVIO's good point is that it balances the quality of the voices and the speed of synthesis. If the balance is changed in one direction, it makes the product change greatly (ex: high quality, very slow vs low quality, very fast).

With this tweet, we will move on to KM4osM's blog post. They appear to be the only member of the vocal synth tinfoil hat brigade who actually went to the event and took pictures to share on Twitter. Kazuhiro Nakamura said many people showed up, but I could not find any other tweets than the ones I shared.

(This is a summary of it, not word for word because of time constraints.)

In 2018, Techno-Speech showed their deep-learning-based vocal synthesis that sounded human.
Upon seeing Kazuhiro Nakamura's tweet, KM4osM felt an obligation to go to the event.
Meiji Kinenkan had a regal aire about it, which made sense on account of it being a Ministry of Internal Affairs and Communications event.
At the venue, this was the booth (Nakamura was there):
cevio1.png

cevio.png

IT HAS DAW INTEGRATION!!!!!!!!!!!!!!!!! (#1 impression) (It was being used with REAPER at the event.)
It seems that NeoCeVIO is closer to being a full product that can be used with a DAW (and maybe standalone?).
On top of that, NeoCeVIO was even running on laptop!
Last year around March, it took 10 hours to render a high quality 5 minute song ("Itsuka Kanarazu" linked at the start of this post). But now it successfully decreased the render time.

[The main question]
From what KM4osM could see, NeoCeVIO (temporary name) seemed to be quite complete. (It sounds like even if you don't tune, the singing sounds good so you can compose late at night and it feels like "the future is here".)
The parameters it had were volume, pitch, timing (duration), vibrato, so it isn't very different from the current CeVIO. KM4osM wasn't sure if they will add more parameters/adjust them due to this being a temporary version or not.
At the booth, "Neo Sasara" (temporary name) sang a cover of "Ai o Komete Hanataba o" by Superfly.

^ This is not the Neo Sasara version, this is the real human version.

It seemed like Neo Sasara could sing with a calm tone of voice, you could feel the expression in her singing. The pitch changed, there was subtle "shakuri" (Japanese singing technique where you sing slightly lower than the note's pitch and ease into the "correct" pitch).

[When will it become a product?]
We don't know. The engine seems fine, but there were things that needed work with the GUI.
 
Last edited:

Kazumimi

Guess you could say I’m a TEIniac?
Staff member
Moderator
Sep 21, 2019
1,063
www.robocheatsy.com
Sounds pretty interesting! I'll have to check out the tweets and articles when I've got the chance. It seems like all the popular vocal synths (except for UTAU, maybe) are really working to improve what their software can do!

Thanks for sharing the information, Uncreepy!
 

xuu

Resident Medium⁵ Fan
Apr 8, 2018
558
20
UK
YES YES YESS FINALLY NEWS ITS BEEN SO LONG

I will go back to being a full time CeVIO shill on the condition they release the Superfly cover as a demo song, it's literally one of my favourite songs.
 
  • Like
Reactions: Tema and uncreepy

lIlI

Staff member
Moderator
Apr 6, 2018
257
The Lightning Strike
This is sooner than I expected!

Chiteico thinks that while VOCALOID:AI is aimed at pros, NeoCeVIO is aimed at "DTMers" (the Japanese term for Desktop Musicians who make MIDI music)/Vocaloid producers.
Chiteico, my life is in your hands buddy :miriam_lili:
 

YOYo_MAMA

I am Thou and Thou aren't Shit...
Oct 19, 2018
29
After a hundred years!! I they finally gave us an update.
I was starting to loose hope on this project. I just hope the audio of NeoCevio will be released somehow. I really want to hear how the new software sound so far.
 
  • Like
Reactions: uncreepy

Prism

Passionate Fan
Jul 18, 2019
201
I don't know if it's just me and coming from an animation background where it takes hours to render a frame but for it to take 5 hours for that quality I think it's well worth the time
 

uncreepy

🎃
Apr 9, 2018
1,281
USA
I don't know if it's just me and coming from an animation background where it takes hours to render a frame but for it to take 5 hours for that quality I think it's well worth the time
I wonder if they had to just export it and hope it sounded good, or if they could tune it and THEN export it? They said they had to fix up the errors with Melodyne, so part of me wonders if the results of "Itsuka Kanarazu" were not able to be controlled. With NeoCeVIO, KM4osM said there were the same old tuning parameters, so why would they have to fix the errors with the pre-commercial/pre-NeoCeVIO version from April through Melodyne? Doesn't it almost seem as if the way the software ran before NeoCeVIO was somehow completely different?

That being said, I wouldn't mind having to wait 5 hours for very human-sounding results. Even if it took that many hours, that would be about the same time it takes to manually tune a song by hand. But if there was no way to preview the song before exporting for 5 hours and then having to fix it in Melodyne, that would make it slightly less appealing.
 

uncreepy

🎃
Apr 9, 2018
1,281
USA
Eji contacted me on Twitter about this thread regarding the upcoming CeVIO. He wrote to me in English, but asked me to rewrite what he wrote before sharing it with everyone. So, I will put a quote box around Eji's information, even though they are rephrased quotes and not direct quotes.

kM4osM called it "NeoCeVIO" on his own. Eji contacted a CeVIO team member through DM and they said they do no have any name officially.
Dang it, I really liked the sound of "NeoCeVIO" and "Neo Sasara". I was all on board for that being true.
:ring_ani_lili:


PSGOZ did not go to the event on October 9th at Meiji Kinenkan. But he tweeted "CeVIO Pro" on December 14th, 2018 when he saw this news release:
The DTM Station article is called "Revolution in singing voice synthesis! AI singing synthesis system sings just like a human through deep learning developed by Nagoya Institute of Technology and Techno Speech". In the article, they compare the new CeVIO to other vocal synths. Mainly the fact that VOCALOID sounds inhuman, but CeVIO will revolutionize the vocal synth world due to sounding human (but isn't ready to be sold). They also talk about Microsoft's virtual singer, Rinna. The new CeVIO wav files used to be monotonous, but are now more dynamic and expressive.

Old CeVIO:


New CeVIO (Well... 2018 new. Might not reflect the version that is now closer to commercial sales):

Note: I did not see anywhere in the article CeVIO being called "CeVIO Pro" by DTM station.


For this tweet, I said I wasn't able to translate the technical stuff, but Eji explained for me:
The goal of the CeVIO update is to accelerate the very slow synthesis of the CeVIO AI that was public in December, 2018. The following article (in English) details that CeVIO was based on CNNs (convolutional neural networks). The new CeVIO uses DNN (deep neural networks) to improve the naturalness of the synthesized singing voice (the new CeVIO based on a DNN model can synthesize natural sounding singing): [1904.06868] Singing voice synthesis based on convolutional neural networks
Objective: CeVIO wants the singing voice to have diverse expression, but also want it to be fast produced/synthesized. So they do time relative modeling with CNN, and not have recursive structure. (Recursive means it loops/chains itself.)
That quote previous was regarding a rough description of a paper from the Acoustical Society of Japan's 2019 Autumn Meeting. The document is a pdf you can pay for if you are not a member of ASJ. (I won't link to it, because no one will be able to read it who don't know Japanese/Japanese vocal synth lingo.)
So we do not have the details that those professors have, we only know that it is much faster and can reach real-time just like the current CeVIO that already exists on the market and can run on a notebook PC.
But it wasn't quite the same as the one public in December 2018, that had 2 kinds of engines showed below:

It had a CNN+V version and a CNN+W version.
V means "traditional Vocoder" and W means "WaveNet Vocoder", the later one produces an almost identical result to a human voice that shows a Mean opinion score of 4.23.
CNN-V as shown at ASJ during the 2019/9 and shown to the public on 2019/10.
Which is a version of CeVIO with significantly increased quality, but not the same quality as shown on the 2018/12 website (these demos: Reproducing high-quality singing voice )
We don't know how good CNN+V was, since it wasn't compared directly with the CNN+W from the website demos, but since it used a traditional vocoder, it can not break the barrier of MOS 4.
It may be better than songs known as 「神調教」(godly-tuned) in the Japanese vocal synth community with its default value, since it is machine learning using a human singer's voice.
But it still can be in reach with human-hand crafted VOCALOIDs since it uses the same Vocoder-type synthesis.
Theoretically.
And, it is all automatic and does not need human tuning. The live demo (from the 9th) showed software run on a notebook PC and was not edited while being operated at the show. (Or those people who came there, like kM4osM, said "We do not know if it can do that since we didn't see it", so we can assume that it can do that.)
So, to sum it all up, the "NeoCeVIO" name is not official (RIP). The software behind the new CeVIO has changed a bit since the old demos from last year. But it is now fast and even a notebook PC can run it. And it will sound good even without tuning it due to the DNN.
 
Last edited:

uncreepy

🎃
Apr 9, 2018
1,281
USA
Adding the info on here cause I dunno where else to put it.

Techno-Speech has established a new twitter for CeVIO/vocal synth technology: テクノスピーチ(Techno-Speech) (@techno_speech) | Twitter

Their icon is by chie_rico. Both the artist's description and Techno-Speech's description say this is for their Twitter avatar, but it seems weird to put in that much effort to design a character simply for a tiny Twitter icon. (Please be a new vocal synth character!)

Her hair clip says TS and she has a pink/blue futuristic design.
Edit: Her pink/blue are the colors from the CeVIO icon.

So far, Techno-Speech has tweeted about vocal synths in general, such as the NHK Special about VOCALOID: AI, and saying that Techno-Speech was created 10 years ago in 2009. Andother Japanese Vocaloids like Luka, Kiyoteru, Yuki, miki, GUMI are their classmate (dunno why the emphasis is on AHS lol).
 

Ulysses

from VOICeVIO
May 4, 2018
39
github.com
As an old CeVIO fan, I'd predict that this character already has a voice db (Talk, not sure for Song).
And it's likely that she will show up in CeVIO.
 

xuu

Resident Medium⁵ Fan
Apr 8, 2018
558
20
UK
I think she's likely to be a new flagship character, either alongside or replacing Sasara as the mascot of the software. Wouldn't be surprised if they revamp their sales model and include her like a default DB with CeVIO's new update. Would be cool if she got a Song Voice that represents the advancements they've made. Maybe a bit VY1 to Sasara's Miku?
 

Trevor

?
May 2, 2018
78
They might not have been able to rerecord Sasara's vocals so they can meet the new quality president CEVIO is trying to set. Like, she may well be included or even purchasable just for fans, but CEVIO has and is trying to keep up with a modern user base. EDIT: after researching sasara's VP, it appears her career took off two years later and now has a singing career. Scheduling a recording session more grueling than the last would be expensive and difficult if the vp was still interested in a product that might compete against her. Especially for a free voice.
 
Last edited:

uncreepy

🎃
Apr 9, 2018
1,281
USA
They actually used Sasara in the demo at the Ministry of Internal Affairs and Communications event (first post in the thread) and said her voice became much more expressive/human. I don't doubt that they would make a more modern voicebank with a new character to help launch the engine, but it seems that even old voicebanks can benefit from the new features of new CeVIO's engine. Plus, for the English/Japanese/Chinese demos back in December 2018, they even used IA's English voice with the new engine: Reproducing high-quality singing voice (Now that I think of it, they used both Sasara and IA in the world's first deep-learning produce CD that featured the song "Itsuka Kanarazu".)

I'm curious how the Color Voice Series would work with the new CeVIO.
 

Prism

Passionate Fan
Jul 18, 2019
201
Everyone seems to forget about them I wonder how the enka singers ones will sound also the green one is best girl
 

uncreepy

🎃
Apr 9, 2018
1,281
USA
Techno-Speeche is having another demo/presentation about new CeVIO.


It will be on December 5th from 1:30 PM ~ 5 PM in Nagoya. The presentation will be by Kazuhiro Nakamura, who will have an oral presentation followed by a poster presentation. The contents are basically the same as the presentation in October (see 1st post on this thread).


Hopefully more people see this and share pictures/explanations of what they saw.
 
  • Like
Reactions: mobius017

xuu

Resident Medium⁵ Fan
Apr 8, 2018
558
20
UK

CeVIO CS7 is coming. Whether this is the deep-learning update in full or not I'm unsure. CeVIO will become a 64-bit application, there will be support for Ruby text and splitting/combing accent (?) in Talk, for Song what I *believe* it's saying is that the "voice quality" or gender parameter will be usable on the time axis like other parameters. For English Song voices words will be automatically split into their syllables. Beta testing will open in a few days.
 
  • Love
Reactions: Tema

uncreepy

🎃
Apr 9, 2018
1,281
USA
My translation of the notice:

[Notice]
Now, the development of the big next version update of "CeVIO CS7" is coming to a close.

Things like 64 bit application, splitting/joining for the pitch accent of the talk section, ruby assignment for lines of speech, detailed tuning in the time axis for singing quality, automatic splitting of English lyric syllables, etc are planned.

Plans for beta testing are soon, so please look forward to it!!
 

Users Who Are Viewing This Thread (Users: 0, Guests: 0)