• We're currently having issues with our e-mail system. Anything requiring e-mail validation (2FA, forgotten passwords, etc.) requires to be changed manually at the moment. Please reach out via the Contact Us form if you require any assistance.

General Discussion Thread

aru ii

Your Neighborhood Tianyi Enthusiast!
Feb 12, 2021
1,011
VOCALOID4 Editor

WyndReed

Dareka tasukete!
Apr 8, 2018
313
26
???, New York
“Note 2: After February 16, 2023, customers who registered a VOCALOID ID and purchased "VOCALOID3 Library galaco" can download "galacoTalk" from the VOCALOID SHOP download code confirmation link included in the e-mail you receive when you purchased the product.”
So you can still get it afterwards??
 

Rylitah

kiyoteru enthusiast
Staff member
Moderator
Apr 8, 2018
577
Crowdfund for UNI's fifth anniversary has been announced! It's for an album, merch (acrylic stand+clear file+pin badge) of the main visual (of the logo for the pin badge), and a plush (including SeeU)!



(Note: this doesn't imply any sort of synth update, just like how SeeU's anniversary crowdfund went. It's just an anniversary celebration.)
 

Rylitah

kiyoteru enthusiast
Staff member
Moderator
Apr 8, 2018
577
Won't lie, it doesn't sound like a person at all, but it's still cool to know this exists
Going by the poino site, that's because there are no voice actors involved. Similar to Adachi Rei, who also has no voice provider. The site says this is done by editing the formants from envelopes of Fourier transforms (or it uses Fourier transforms to synthesize the sounds? I don't really understand the technical aspect of this, haha) - either way, it'll probably never sound perfectly clear or human, but it's a valiant effort.

... Though I wonder if the first character's name being Reichii (funnily enough, the name of the character on the github download is spelled "Reinii" or "Rainy") is invoking Adachi Rei on purpose, considering they're more or less the same thing. They sound pretty similar too, also because of the source of their voices, I suppose. Both licenses are also pretty free for personal and commercial use (Adachi Rei only requires a commercial license for big businesses/people expecting to make a huge profit off her character/voice, though small-scale paid doujin works are fine to make and distribute without that).

Really, the big difference between them is Adachi Rei's singing voicebank is for UTAU (and A.I.VOICE for speech being a paid product) while poino seems to just be fully it's own free thing. That's pretty neat.
 

pico

robot enjoyer
Sep 10, 2020
530
A SIGNAL PROCESSING MOMENT?! ON MY FORUM? IT'S MORE LIKELY THAN YOU'D THINK! :piko_ani_lili:

The inverse fourier transform generates the output signal in real time. Fourier transforms are basically the most common way of interacting with signals in signal processing. On the most basic level, a fourier transform converts a signal from the time domain to the frequency domain. It's most easy to understand by looking at a picture:

Before the fourier transform is performed, we see a constant signal being generated. But it's kind of hard to understand and modify in this form.
After we perform the fourier transform and convert the signal into the frequency domain, we see an impulse at the frequency the signal is at. In this case, it's about ~3 hertz (Hz). Super obvious!
If you have an input signal that is changing frequency (pitch) over time, like a human singing voice, we can see the individual frequencies with the fourier transform!
1675296062423.png

What an artificial voice like poino does is filter a signal at a given pitch with the fourier transform to make it resemble a human voice.

For example, we can see what frequencies make up the vowel "a" like this:
1675296232041.png
Article explaining it: Identifying sounds in spectrograms
When the spectrogram is red, there's a higher density of sound there at a certain frequency. So we want to amplify those red parts of our signal to create an "a" sound, and filter out the rest.

The fourier transform is our tool for accomplishing this. After filtering and amplifying different parts of that simple sine wave we started with, we can end up with a sound that sounds more like a human voice after inverse fourier transforming it back to the original time domain, which you then play out of your speakers as sound.

It works by taking the Fourier transform of the signal, then attenuating or amplifying specific frequencies, and finally inverse transforming the result.
A good article on how filtering with the inverse fourier transform works:
Intro. to Signal Processing:Fourier filter

It can be a lot to get your head around at first!

I think the more poignant difference between Adachi Rei and this software is that Missile created Rei's voice samples by hand by manually shifting the source sin wave around in Audacity and playing with every sound manually, which he then exported. On the other hand, this software is generating a voice completely algorithmically. Rei's voice is going through a lot of different layers of processing by the time you export it from UTAU, while poino generates the voice from scratch in real time. I like Rei's voice for sitting squarely in the middle between completely algorithmically generated sound and being lovingly crafted by a person by hand.
 

Attachments

Last edited:

peaches2217

Give me Gackpoid AI or give me DEATH
Sep 11, 2019
1,930
26
Arklahoma
A SIGNAL PROCESSING MOMENT?! ON MY FORUM? IT'S MORE LIKELY THAN YOU'D THINK! :piko_ani_lili:

The inverse fourier transform generates the output signal in real time. Fourier transforms are basically the most common way of interacting with signals in signal processing. On the most basic level, a fourier transform converts a signal from the time domain to the frequency domain. It's most easy to understand by looking at a picture:

Before the fourier transform is performed, we see a constant signal being generated. But it's kind of hard to understand and modify in this form.
After we perform the fourier transform and convert the signal into the frequency domain, we see an impulse at the frequency the signal is at. In this case, it's about ~3 hertz (Hz). Super obvious!
If you have an input signal that is changing frequency (pitch) over time, like a human singing voice, we can see the individual frequencies with the fourier transform!
View attachment 7276

What an artificial voice like poino does is filter a signal at a given pitch with the fourier transform to make it resemble a human voice.

We can tell what frequencies certain sounds make up like this:
View attachment 7277
Article explaining it: Identifying sounds in spectrograms
When the spectrogram is red, there's a higher density of sound there at a certain frequency. So we want to amplify those red parts of our signal to create an "a" sound, and filter out the rest.

The fourier transform is our tool for accomplishing this. After filtering and amplifying different parts of that simple sine wave we started with, we can end up with a sound that sounds more like a human voice after inverse fourier transforming it back to the original time domain, which you then play out of your speakers as sound.


A good article on how filtering with the inverse fourier transform works:
Intro. to Signal Processing:Fourier filter

It can be a lot to get your head around at first!

I think the more poignant difference between Adachi Rei and this software is that Missile created Rei's voice samples by hand by manually shifting the source sin wave around in Audacity and playing with every sound manually, which he then exported. On the other hand, this software is generating a voice completely algorithmically. Rei's voice is going through a lot of different layers of processing by the time you export it from UTAU, while poino generates the voice from scratch in real time. I like Rei's voice for sitting squarely in the middle between completely algorithmically generated sound and being lovingly crafted by a person by hand.
Damn, this is amazing! I don’t understand most of it, but I get the basic differences, and I’m amazed at how much work goes into these things.
 

pico

robot enjoyer
Sep 10, 2020
530
It gets even more complex once you get into the applications orz but it's all fascinating and you can do an incredible amount with it! It wouldn't be an exaggeration to say that modern society is practically built off of fourier transforms! lol
 

AddictiveCUL (Add)

CUL addicted!
Jan 6, 2023
102
youtube.com
A SIGNAL PROCESSING MOMENT?! ON MY FORUM? IT'S MORE LIKELY THAN YOU'D THINK! :piko_ani_lili:

The inverse fourier transform generates the output signal in real time. Fourier transforms are basically the most common way of interacting with signals in signal processing. On the most basic level, a fourier transform converts a signal from the time domain to the frequency domain. It's most easy to understand by looking at a picture:

Before the fourier transform is performed, we see a constant signal being generated. But it's kind of hard to understand and modify in this form.
After we perform the fourier transform and convert the signal into the frequency domain, we see an impulse at the frequency the signal is at. In this case, it's about ~3 hertz (Hz). Super obvious!
If you have an input signal that is changing frequency (pitch) over time, like a human singing voice, we can see the individual frequencies with the fourier transform!
View attachment 7276

What an artificial voice like poino does is filter a signal at a given pitch with the fourier transform to make it resemble a human voice.

For example, we can see what frequencies make up the vowel "a" like this:
View attachment 7277
Article explaining it: Identifying sounds in spectrograms
When the spectrogram is red, there's a higher density of sound there at a certain frequency. So we want to amplify those red parts of our signal to create an "a" sound, and filter out the rest.

The fourier transform is our tool for accomplishing this. After filtering and amplifying different parts of that simple sine wave we started with, we can end up with a sound that sounds more like a human voice after inverse fourier transforming it back to the original time domain, which you then play out of your speakers as sound.


A good article on how filtering with the inverse fourier transform works:
Intro. to Signal Processing:Fourier filter

It can be a lot to get your head around at first!

I think the more poignant difference between Adachi Rei and this software is that Missile created Rei's voice samples by hand by manually shifting the source sin wave around in Audacity and playing with every sound manually, which he then exported. On the other hand, this software is generating a voice completely algorithmically. Rei's voice is going through a lot of different layers of processing by the time you export it from UTAU, while poino generates the voice from scratch in real time. I like Rei's voice for sitting squarely in the middle between completely algorithmically generated sound and being lovingly crafted by a person by hand.
Nerd! Kkkkkkkkk <3
 
  • Haha
Reactions: pico

Users Who Are Viewing This Thread (Users: 0, Guests: 1)