Question V5 vocal fry-- how did they do it?

Vegetaljuce · Oct 27, 2019

Ever since Vocaloid5 came out, I've been wondering how they achieved the vocal fry effect. Before the release of V5, I assumed that the only way to achieve a convincing vocal fry was to record actual fry samples from the voice provider and work them into a voicebank through something similar to cross-synthesis. Given this, I believed that if vocal fry was added to vocaloid's repertoire, it would only be able to be used with new banks which had recorded some fry samples.

However, it became clear that my assumptions were totally wrong when V5 announced that the vocal fry feature was backward compatible with old voicebanks. And it's driving me crazy wondering how they did it.

The only decent example of an attempt at vocal fry on a vocaloid before V5 was posted by the producer PSGOZ back in 2015, and even then, it's a little rough around the edges and not as natural sounding as what Vocaloid5 was able to achieve. Additionally, afak, PSGOZ never revealed how he did it, so no clues there either.

TLDR; Vocaloid5's vocal fry is artificially generated. Do any of you have any insights on how exactly this effect might be being accomplished under the hood?

___ · Oct 27, 2019

Is it really artificially generated or is that your assumption?

Perhaps it works the same as the breaths, they had one or two people record the vocal fry and it gets reused for every voicebank compatible with V5.

Vegetaljuce · Oct 27, 2019

patuk said:
Is it really artificially generated or is that your assumption?

Perhaps it works the same as the breaths, they had one or two people record the vocal fry and it gets reused for every voicebank compatible with V5.

I’m 90% confident it’s artificially generated and not a default set of recordings like V5’s breaths.

I could, of course, be wrong, but this is my reasoning:

If the vocal fry sounds were prerecorded, I’d expect to see a way to toggle between male and female like you do with the breaths. I feel like using a male fry recording on a female vocal like Miku would sound just as alien as using the male breaths with her– which is why you’re given an option to chose either male or female breaths. I don’t think this is the case with fry, as far as I know.
Even with the proper gender selected in the breath settings, lots of people including myself find V5’s default breaths just don’t match/ sound right with a lot of voicebanks, particularly juvenile sounding ones. By contrast, every example of fry I’ve seen seems to match the character and timbre of the voicebank it’s being used on quite well. I’m not convinced the fry effect would match different voicebanks as well as it does if it weren’t constructed using the bank itself.
Just the general logistics of applying one or two default fry recordings over vocals from a different voice provider. Vocal fry is voiced (although less so than normal singing), unlike breathing which is largely unvoiced, so I struggle to imagine how default samples could be naturally applied to various voice types due to the inherent nature of a fry sample, which retains some of the voiced vocal characteristics of the sample provider.

___ · Oct 27, 2019

Hmm, you did bring up some good points.

Well the vocal fry can be found in the attack and release options in V5 and correct me if I'm wrong but I believe V1 has the attack feature too? Perhaps you could try digging up some stuff about V1 that could be helpful??? But I dunno.

Vegetaljuce said:
Even with the proper gender selected in the breath settings, lots of people including myself find V5’s default breaths just don’t match/ sound right with a lot of voicebanks, particularly juvenile sounding ones

Aaah, the breaths can be tricky but you can make it work. For Una I just toggle with the breath settings to make them less strong and....I think she sounds alright that way. The breaths on default setting are way too strong for any voicebank really, even the 4 default ones.

Kona · Oct 27, 2019

It could just be an effect. Something similar to how Growl was produced on V4 voicebanks, even on ones that, from a more logical standpoint, shouldn’t be able to growl with their tone (the softer/whispery VBs, since it’s just...contradicting in technique).

Vocal fry though, is a lot easier to get away with than growl, which ended up sounding weird and unnatural more often than not. Vocal fry is the lowest register of a voice, and a breaking point in the voice. If you do a straight vocal fry on your own voice, you’ll notice your tone isn’t really there much at all. Going to a note in fry, is really just going from that extreme low register tothe note, so you shift registers doing it. The vocal folds differently in singing and fry, so that could be why it’s easier too.

Kazumi · Oct 27, 2019

Vegetaljuce said:
I’m 90% confident it’s artificially generated and not a default set of recordings like V5’s breaths.

I could, of course, be wrong, but this is my reasoning:

If the vocal fry sounds were prerecorded, I’d expect to see a way to toggle between male and female like you do with the breaths. I feel like using a male fry recording on a female vocal like Miku would sound just as alien as using the male breaths with her– which is why you’re given an option to chose either male or female breaths. I don’t think this is the case with fry, as far as I know.

Unfortunately, it is pre-recorded - like the breaths. If you look at the vocal fry in attack and release, they're labeled as "VocalFry F Soft/VocalFry F Power/VocalFry M" (the last 3 on each panel). The F and M stand for male and female. And if you see how they plugin with other voicebanks compared to Chris and Amy, it's obvious it's pre-recorded and just adjusting per vb.

Also if you're looking for good pre/non-v5 growl examples, might I show you Lutie - he sometimes overdoes it, but when it's natural vowels, it sounds great on a ton of his videos, and it's clearly done in Piapro for all of em.

But yeah, unlike the Style Presets or Singing Skill files, they're not adjustable so I think we're stuck with what we have :\

uncreepy · Oct 27, 2019

My unhelpful/vague comment about how V5 works:
The patent for Vocaloid 5 can be found on Google's patent search engine. It explains the algorithms for basically everything and includes diagrams of when each "setting" kicks in, it shows each note block broken into sections and then math with a sin wave next to it to calculate stuff. I really don't feel like looking it up again, and I legitimately can't remember if the document is in English or Japanese, but I skimmed it some time after V5 came out (it's a pdf). But even though I've been reading a lot of stuff about how vocal synthesis works (mainly for the crypton_wat thread), the document was too complex for me to understand fully what was being said.

DefiantKitsune · Oct 27, 2019

Kona said:
It could just be an effect. Something similar to how Growl was produced on V4 voicebanks, even on ones that, from a more logical standpoint, shouldn’t be able to growl with their tone (the softer/whispery VBs, since it’s just...contradicting in technique).

Vocal fry though, is a lot easier to get away with than growl, which ended up sounding weird and unnatural more often than not. Vocal fry is the lowest register of a voice, and a breaking point in the voice. If you do a straight vocal fry on your own voice, you’ll notice your tone isn’t really there much at all. Going to a note in fry, is really just going from that extreme low register tothe note, so you shift registers doing it. The vocal folds differently in singing and fry, so that could be why it’s easier too.

Growl was just, separate recordings though. Like the engine doesn't add it, the samples exist separately - that's why miki and Yuki needed some new recordings, for example.

Vegetaljuce · Oct 29, 2019

Kazumi said:
Unfortunately, it is pre-recorded - like the breaths. If you look at the vocal fry in attack and release, they're labeled as "VocalFry F Soft/VocalFry F Power/VocalFry M" (the last 3 on each panel). The F and M stand for male and female. And if you see how they plugin with other voicebanks compared to Chris and Amy, it's obvious it's pre-recorded and just adjusting per vb.

View attachment 2419

Also if you're looking for good pre/non-v5 growl examples, might I show you Lutie - he sometimes overdoes it, but when it's natural vowels, it sounds great on a ton of his videos, and it's clearly done in Piapro for all of em.

But yeah, unlike the Style Presets or Singing Skill files, they're not adjustable so I think we're stuck with what we have :\

Wow, that's unexpected but interesting. I wonder if they're just like, formant-shifting the samples to match the voicebank? I still have so many questions lol.

Also I was really impressed with the fry Lutie is able to achieve. From what I could tell, it's created by taking advantage of the distorted quality a VB gets when it's taken way out of its range into very low notes, and then shooting it back up. I just spent about an hour playing around with that technique in V4 though, and I just can't seem to get it to sound quite as good as Lutie does. I guess I could be missing some secret component or something.

Vegetaljuce · Oct 29, 2019

uncreepy said:
My unhelpful/vague comment about how V5 works:
The patent for Vocaloid 5 can be found on Google's patent search engine. It explains the algorithms for basically everything and includes diagrams of when each "setting" kicks in, it shows each note block broken into sections and then math with a sin wave next to it to calculate stuff. I really don't feel like looking it up again, and I legitimately can't remember if the document is in English or Japanese, but I skimmed it some time after V5 came out (it's a pdf). But even though I've been reading a lot of stuff about how vocal synthesis works (mainly for the crypton_wat thread), the document was too complex for me to understand fully what was being said.

I could just be an idiot, but I searched all sorts of combinations of relevant keywords and I wasn't able to find any results for Vocaloid 5 on google patents. If you ever find the link again, I'd love to take a look at it!

uncreepy · Oct 29, 2019

Click this thread by @Exemplar: Round up of Yamaha's patents with regards to singing synthesis
In the "covered a bunch of these back in November" link to an older thread, it has the patents with the diagrams on there. Some of the images are in Japanese, but some art in English. For example, the "Display control method and editing apparatus for voice synthesis" has English images and it talks about attack and release including vocal fries.

Search

Search

Question V5 vocal fry-- how did they do it?

Vegetaljuce

Gumi English Geek

___

‎

Vegetaljuce

Gumi English Geek

___

‎

Kona

Avanna's #1 Fan

Kazumi

That One Furry

uncreepy

Veteran

DefiantKitsune

Lonely kanon fan

Vegetaljuce

Gumi English Geek

Vegetaljuce

Gumi English Geek

uncreepy

Veteran

Users Who Are Viewing This Thread (Users: 0, Guests: 1)