• We're currently having issues with our e-mail system. Anything requiring e-mail validation (2FA, forgotten passwords, etc.) requires to be changed manually at the moment. Please reach out via the Contact Us form if you require any assistance.

CeVIO CeVIO's deep-learning-based vocal synth coming "soon" + buy Satou Sasara 50% off code until end of Feb.

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618
DTM Station wrote an article called "Just a bit left before AI singing synthesis!? With 6 years of huge progress and becoming CeVIO Creative Studio 6. CeVIO Song & Talk Starter (Satou Sasara) is about half price until the end of February for 5,538円 ($49.96)" that I've partially translated. I only translated the stuff relevant to the "new" CeVIO and ignored stuff about current CeVIO.

Note: DTM Station is a Japanese news/educational site that talks about things like Vocaloid, DAWs, MIDI, and other audio/music-related production stuff.

How to buy CeVIO Song & Talk Starter (Satou Sasara) with the 50% discount from DTM Station:
Warning:
The sale only lasts through February.
  1. Go to this shop (SOURCENEXT) that DTM linked to in the article.
  2. On the top right, click the yellow button that says カートに入れる (put in cart).
  3. In the クーポンを入力 (enter coupon code) spot, put in the code DTM_1902 and click 適用 (apply) so the price goes from 7,538円 ($68.01) to 5,538円 ($49.96).
(It is already 30% off at this shop compared to the other shops like Amazon and Vector, so you are applying even more savings with the promo code from DTM Station.)
(On Amazon and Vector, the current price is 10,800円 ($97.44) right now, so this is a huge deal.)

Note: You only have to buy a starter once. Since Sasara has both a talk and song voice, it unlocks both of those portions of the software. So then you just have to buy either talk or song voices for other characters like ONE song voice or Takahashi talk voice instead of starter packs for them.

What DTM Station had to say about the "new" CeVIO:
At the end of last year, AI singing synthesis became a big topic with our article "Revolution in singing synthesis technology! Singing AI singing synthesis system sings just like a human with deep learning being developed by Nagoyakogyo University and Techno-Speech, Inc."
This was the article:
"Revolution in singing voice synthesis! AI singing synthesis system sings just like a human through deep learning developed by Nagoya Institute of Technology and Techno Speech" was uploaded to DTM Station. Vocaloid has a singing ability from a different perspective, progressing to a level where it's hard to tell the difference between AI and human singing.
Links to this article: 歌声合成技術に革命!ディープラーニングで人間さながらに歌うAI歌声合成システムを名工大とテクノスピーチが開発|藤本健の “DTMステーション”
The article talks about different vocal synths like Microsoft's Rinna, Vocaloid, and CeVIO.

Now, back to the new article...
It says that CeVIO became ver 6 in December and has been continuing to develop (now they're at ver 6.1.31 as of today with free updates as usual).

This is the demo of what Satou Sasara's voice was used for related to deep learning:

The future version of CeVIO devloped by Nagoyakogyo University and Techno-Speech, Inc. uses the HMM system (The Hidden Marvok Model) (which appears to be used by Microsoft for their speech synthesis also).

Chart DTM Station had explaining HMM in the most simple way possible that I translated:
1890

Caption on the image:
Singing synthesis with HMM implemented using human voice physics modeling
In other words, vocal chords vibrate with pronunciation, the throat shape and mouth openness etc. for a vocalizing human voice is calculated and the simulation is done with HMM, and a big feature of using CeVIO Creative Studio's engine is that the data usage is small.
1889
Look at them tiny file sizes. DTM Station proceeds to diss Vocaloid 5's voice bank file sizes and uses Amy as an example for being 2.2 GB, saying the difference in Vocaloid and CeVIO's software is very big (literally).

So far, CeVIO has been through 6 versions, with many (always free) updates in between to improve the software, dictionary, and voices.
 

Attachments

Last edited:

xuu

long suffering synth fan
Apr 8, 2018
671
24
UK
Sasara sounds gorgeous with the new HMM synthesis. Looking forward to the big update.
 

Twillby

Longtime Listener
Apr 8, 2018
409
33
US
I don't need the song editor but dang that deal is still cheaper than buying a talk starter at normal price... what's an extra copy I doubt I'll be able to get the money in time but you NEVER KNOW

I'm so nervous-excited about the update ahhhh
 

Heruru Meruru

THE v@SHFUCKER Dearly Stars
I, for one, would welcome our new CeVIOverlords if the editor only came in English. :(

edit: Huh, it seems it is in English now. I'm still very attached to the Vocaloid characters and I'm not letting go of them, though.
 
Last edited:
  • Like
Reactions: sour_supreme

Kona

Avanna's #1 Fan
Apr 8, 2018
814
USA
I, for one, would welcome our new CeVIOverlords if the editor only came in English. :(
The editor is in English actually! As of CeVIO CS6.1, the interfacevis in full English and it works all very well!



My own personal excitement is overflowing about this. I really wanna get this, but I dunno if I want to get Sasara even with this deal...I’m saving up right now for a mic. I wish I could though, but I already have IA English and want to just save up for IA Talk later
 

xuu

long suffering synth fan
Apr 8, 2018
671
24
UK
My own personal excitement is overflowing about this. I really wanna get this, but I dunno if I want to get Sasara even with this deal...I’m saving up right now for a mic. I wish I could though, but I already have IA English and want to just save up for IA Talk later
Since you already have IA English, it'd probably be better to just wait for them to go on sale on Vector like they usually do. Sasara song voice standalone usually goes to like ~3000JPY?

I hope this brings a lot of publicity to CeVIO since it does seem like most people still don't realise it has a fully English interface now. I hope we get to see some new voices soon too.
 
  • Like
Reactions: Twillby

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618
I found more info. My collection is messy cause it took a lot of work to round it up. It's basically translated notes.


Nagoya Institute reveals the deep learning-based CeVIO will be demoed in March:
Source: 超高品質な歌声を再現するAI歌声合成システム~名工大と大学発ベンチャーが共同開発に成功~|国立大学法人名古屋工業大学
Title: Ultra-high quality singing voice recreated by an AI singing synthesis system~A successful venture from the joint development from Nagoya Institute of Technology laboratory and university

International spoken language technology research

They have developed singing synthesis technology that can recreate to a high precision human voice qualities, mannerisms, and way of singing. As for this research, they applied statistical machine learning like deep learning about 2 hours of specific singer's singing sound waves, and learned the singer's voice qualities, mannerisms, and way of singing.

As a reminder, the languages are: Japanese (Satou Sasara), English (IA English), Chinese (unknown)
None of the samples have had their tuning tweaked, this is to show what the software can do all on its own without help by a producer to tune it to sound better. So theoretically, it could sound even better when tuned than these already wonderful examples demonstrate.
The research results will be exhibited in March at the Acoustical Society of Japan 2019 Spring Meeting. Based on the results of the research, they are aiming to improve the speech-related technology and survices available even further.

The Acoustical Society of Japan reveals it will be demoed March 5th:
Source: 一般社団法人 日本音響学会 -- The Acoustical Society of Japan --

There are 3 presentation days: March 5th, 6th, and 7th.

I think I found them on the first day on p. 28:
1-9-15
A study of adversarial learning for emotional speech synthesis based on deep neural networks
^ That's Nagoya Institute's segment

I had to look through every page of the pdf to find it, cause it's not searchable. The text is all images and can't be copied and pasted. So, I do not want to have to hand write all the kanji in order to translate their segment fully. I mean, I could, but I don't want to. : P

This is what "adversarial learning" means:
Adversarial machine learning is a technique employed in the field of machine learning which attempts to fool models through malicious input.
^ From Adversarial machine learning - Wikipedia


Things the Techno-Speech, Inc. website reveals they can provide in terms of vocal synthesis for customers (which I assume CeVIO will get):
Sources: テキスト音声合成ソフトウェア開発キット and 歌声合成ソフトウェア開発キット

a) Emotion sliders
An engine that can have different voices like "normal, happy, bashful, angry, sad" and more sliders. The parameters are statistical and can make natural-sounding speaking styles.

b) Can add new words to the dictionary and can tweak pronunciation

c) The engine is high speed and light weight on your system

It doesn't have much delay. Even computers that aren't high performance can use it (and smartphones, too (Android and iPhone)). The data size is extermely small since the voice synthesis method is built from statisticalcal modeling. The memory amount used is small, small burden on your computer.

d) Simple interface
Easy to understand. Voice pitch, volume, speed, age (from kid to adult) can be adjusted. Can adjust down to the phoneme unit.

e) Can export in various file types.

f) Multi-lingual support

They basically only have worked with Japanese, English, Chinese, and need to work with the client in order to do other languages.
Japanese uses kana, English, Chinese uses pinyin, Taiwanese Mandarin uses Bopomofo.

g) Operating system:
They only have worked with Windows and C++ and need to work with the client for other systems.

They used CeVIO as an example of application and had mysterious screen shots:
Note that it is called "CEVIO Creative Studio S" instead of 6 on the 1st 2 screen shots. (But it doesn't really look different to me. Must just be back end stuff that's different.) They are using Satou Sasara.

1892
^ Talk voice

1893
^ Song voice

1894
^ Emotion ratio slider for Windows

1895
^ Emotion ratio slider for smartphone

Necessary data size: The voice banks are 1~3MB and the dictionary (Japanese) is 70MB
Singing data size is 1~5MB, lyric anaylsis dictionary is 0.1MB (Japanese)
 
  • Like
Reactions: ngoomie

xuu

long suffering synth fan
Apr 8, 2018
671
24
UK
It will be interesting to see if they include the emotion sliders in the Song engine rather than just the Talk engine as before. Not long to go now!
 

Ulysses

from VOICeVIO
May 4, 2018
40
Those screenshots are quite old... Should have nothing to do with the "great update".

I think for current engine it's already possible to enable "emotion" for song voices (since it's just a remix of two emotional voice's outputs with different ratio, just like what they do with Talk voices) but - it's just something like Miku Dark and Vocaloid XSY feature... all old things.
 
  • Like
Reactions: xuu

uncreepy

👵Escaped from the retirement home
Apr 9, 2018
1,618
Those screenshots are quite old... Should have nothing to do with the "great update".

I think for current engine it's already possible to enable "emotion" for song voices (since it's just a remix of two emotional voice's outputs with different ratio, just like what they do with Talk voices) but - it's just something like Miku Dark and Vocaloid XSY feature... all old things.
The reason I translated a bunch of that stuff was because I can't tell if it has any important information hidden in it by just glancing at it like I could for English. (Vocal synth related vocab is getting more normal for me, but it's a lot of work.) Also kind of to document what might change, I guess.
:ring_ani_lili:

I messed up and forgot that CeVIO was called S several years ago. (I'm only human and I wrote that at like 5 am. At least the March 5th-related stuff was useful.) Everything seemed old, I was thinking that maybe the interface of CeVIO will be staying the same and only back end stuff will change? (With the deep learning sample dictionary stuff.) And maybe the available languages seemed useful, so we don't look forward to CeVIO Korean or something.

Unless I'm really dumb and can't find it, CeVIO currently won't let you edit the emotion of the singing voices.

Cevio is made by the venture business called Techno-Speech from within Nagoya Institute of Technology. So I guess that means that NIT made Techo-Speech and Techno-Speech made CeVIO and they are both working on the new CeVIO.

I thought it was interesting that they're offering to clients their software style with the sliders and how to voice emotions are made to for things like GPS or games or deceased vocalists. The Color Voice Series is from JOYSOUND, for example.
 

Ulysses

from VOICeVIO
May 4, 2018
40
CeVIO currently won't let you edit the emotion of the singing voices.
Sure, CeVIO doesn't provide such feature, but I believe it's just disabled since there are currently no such voices.

I think for current engine it's already possible to enable "emotion" for song voices
I was talking about the engine (SVSS). This feature was enabled on the other software - ScoreMaker Zero. There are 2 voices included: Yoko (Sinsy Yoko) and "Male" Yoko (I called him "Yokun"). You can mix these 2 voices with a gender slider.
 

Users Who Are Viewing This Thread (Users: 0, Guests: 1)