I watched the Beef Jerky video again and thought I'd point out a few more things. (I'm clearly bored, don't ask. I think if I can figure out what each section of the upcoming Piapro does, then we cal all be masters at tuning and become the next Mitchie-Ms.
)
Note: I used these sites (
3.3 Spectral Envelopes and
https://www.sciencedirect.com/science/article/pii/S0167639316300413 ) for research.
F0: (0:17 - 0:26) It's pitch. Beefy Jerky detects the pitch it thinks you're singing at. If you move the dial, it makes the pitch higher or lower by moving the whole sound up/down as a group. (When he turns the dial, you can see a dark grey version of the white group of "notes" (assuming it's like VocaListener/Songle).) When you move the pitch up, it sounds like Mickey Mouse because moving the octaves up shrinks the vocal tract's length. Moving up/down is related to transposing.
Spectral envelope: (0:29 0:30) - The spectral envelope wraps around the peaks of a sound wave. In singing, the spectral envelope is independent of pitch. When you affect pitch (the F0), the spectral envelope moves along with it (up/down). When you move the pitch up and it sounds like Mickey because of the vocal tract being shrunk, you use the spectral envelope to fix it. The problem of the Mickey voice is from when you pitch shift up, it makes the amplitude of the sound waves get wonked up and have to rely on the spectral envelope to tell it where to belong. You have to adjust the spectral envelope to fit properly around the sound wave which was moved up and "jostled" so that the locations of the formants (
Formant - Wikipedia there's a chart on here) and spectral envelope are placed to sound natural.
Aperiodicity: (0:26 - 0:28) In vocoders (VSTs that coat the voice in things like autotune or a robot voice, for example), speech analysis is made of the fundamental frequency (F0), spectral envelope estimators, and an aperiodic parameter to improve the sound quality of synthesized speech. Aperiodicity is a parameter which must be used with the F0 and spectral envelope to improve sound quality. Aperiodicity means it doesn't have regular/periodic intervals (I guess that you could use periodicity in vocal synthesis, but it's cruddy because it depends on a position in time and wonks up the estimated speech-- it's not good to use because the target spectral envelope is never static). Aperiodicity also means that an instrument is being damped so it doesn't oscillate.
I might try to research how periodic and aperiodic waves relate to vocal synthesis (the remaining bottom 2 boxes), but I am running out of time to write and my brain hurts.