I’m not gonna lie, this sounds… exactly like I was fearing it would sound. If that’s what it sounds like after nine months of work… still, at least they actually put out a sample, even if it’s three months late. I wasn’t expecting that.
I really like Audine's voice provider's tone! I'm not usually a fan of cute voices, but she sounds sweet and endearing in a very natural way.
I'm glad to hear a sample, showing how far they've progressed programming an whole new engine in just over half a year. I not going to bother judging the sound due to how short the sample is, and the fact they're in alpha; but it's fun to hear for historical purposes.
Looking forward to more, I hope things continue to progress smoothly.
Eh, my issue is: I get it’s still in alpha, but at this point they’ve been working on it actively since January. I feel like they reeeeeaaaally should have more at this point. They’ve been more focused on building and adjusting reclists than actually recording and utilizing those reclists, and it shows. Had they just kept focus on their launch languages I think the results would be a lot smoother… it’s just the result of unrealistic expectations I think. They’ve wasted a lot of time. Please don’t massacre me, I’m just disheartened
ok so, i cant be the only hearing some terrible audio peaking at least in the first sample right? like, oh my god, it does not sound good to me. i dont mind audine's tone though, its nice! i hope it is just an issue in the twitter upload perhaps but that volume is not good lol.
my issue from the beginning was how they made it sound like they have a lot less to go through to have a completed engine and voice and with this progress its pretty clear they have likely underestimated the scope of it. not to say they cant do it, but i always felt from the start they were really focused on the part that wasnt actually technical- obviously its hard to get a coder who can do such a complicated thing! and im not exactly shocked by this progress, but i just think they really havent focused on whats important or know how to it. the kickstarter wouldve been a disaster if this was the same or similar progress theyd be making, so im glad it didnt happen and they can do this without the consequence of failing or falling behind expectations but its still obvious to me theyre lacking a lotta things to make this kinda happen in a more professional way.
like i think the key thing to me is remembering what they expected to do in the kickstart schedule, which is very obvious not being achieved like they planned/predicted, and i dont think it really has to do with money either. i hate to have this sound like a downer post because i dont mean it that way, i dont want to see this fail, i really am just sitting back chillin and waiting for some cool progress, which we got! but from the kitckstarter plans to now, i havent really trusted found confidence in how they manage themselves, and i find this important for me to consider when it comes to buying a product.
I don't want to down on them but I hope they improve synthesis it sounds very lifeless and robotic but that might be their goal. Especially for having the synthesis be based one to one from the source data. At least the voice providers sound nice I hope they have a decent recording set up because I don't want a repeat of Oliver.
forgot their names so i will name them by colors...i hate how the yellow one sounds. the purple one's voice has potential, i like the tone of it, but it is kinda hard to tell with these samples...especially because theyre all one note and not actually singing, like what
also geez has it really been 9 months? time flies wow....
All of this is highly speculative, but I think I might have an idea of what's going on.
They showed how they can detect the F0 and formants, and the samples both sounded like some of the formants were additionally attenuated and therefore louder than in the original audio. Also, this is not only happening where sample transitions might be, but throughout the audio clips.
Together, this would suggest to me that their engine is "pitch-adaptive", which means the data batches it processes each represent a single F0 wave, or a single vocal chord vibration instead of a fixed amount of time. (They might also be processing a fixed number of vocal chord vibrations at a time, but that wouldn't really change things)
This has several advantages for the math running in the background. I'm not sure how to explain the exact reasons, but the end result is that pitch shifting samples and a few other things, like Gender Factor modifications, are far easier for them compared to if their engine was using a fixed tick rate. So I wouldn't be too worried about the sample having the same pitch as the original audio recording right now, it's unlikely they're going to have problems with pitch changes, and parameters like Gender Factor and Breathiness should work great as well. (Btw, most concatenative commercial vocal synths also use pitch-adaptive synthesis)
However, the biggest disadvantage of pitch-adaptive engines is that the data batches used by the engine each contain different amounts of data. This makes constructing the transitions between them a lot harder. If it isn't done well enough, these flawed transitions will be audible as selective attenuation of formants, like what can be heard in the sample. Mitigating this effect is an extremely complicated task, and over a dozen research papers have been written on the subject, each proposing different methods for how to deal with the problem.
But I think Maghni AI is using its own algorithm for this task instead of using one from a paper. During their original Q&A, I asked several questions about their synthesis algorithm, which they understandably didn't want to fully answer. However, they did say that their engine isn't using Fourier transforms, but rather a custom, similar algorithm. One of the many variants of the Fourier transform is the so-called "inverse short-time Fourier transform", which is used to calculate the transitions between data batches in non pitch-adaptive engines. So this quote can be interpreted as them adapting this algorithm for their pitch-adaptive synth.
Assuming that any of this is correct, which I wouldn't be too sure about, this would mean that they've been working on the part that causes most of the deviation of the synthesized from the original sound since way before the first Q&A and probably still do. It's very impressive that they took on such a task at all, but it's still kind of a gamble. If they get it wrong, there's almost no way to work around it, but if they get it right, it drastically improves the quality of the engine in other areas.
Please keep in mind that this really is speculation. I only know the same audio samples as you, and I've just based a lot of assumtions on just that. (and a question from an old Q&A)
The demos were definitely not ready for primetime. They certainly should not have shown anything at this time and waited as a vocoder demo doesn't really promise a whole lot to the average person.
I still find it odd that there is no real tonal loss but massive amounts of additional noise content. Generally a universal vocoder has drops/loss in tonal content
Making a NN synth is a massive undertaking but it looks like the devs are being asked way too much by Vocatone. It should have started much smaller and had additional functionality added as needed, not all at once.
Unfortunately the demos/video look to be either rushed together by someone who doesn't understand the tech (such as marketing staff).
Or possibly they just didn't have anything to show and the demos could have been fabricated with other tools in a panic to keep up with people's unreasonable expectations. Which isn't unheard of considering how demanding people can be when it comes to vocal synth. Which I hope isn't the case.
If I recall correctly it‘s actually not a NN synth in the way the term is normally used, and they‘re just calling it AI because it‘s going to have either AI assisted or automatic tuning
I think they said during the Q&A that it‘s a concatenative engine, and then clarified what the AI is used for when answering a later question.
Either way, I completely agree with you that it‘s a huge undertaking. They announced their project way too early, and this is now making things a lot harder for them since everything happens under the eye of tons of people, so I can see why they might see the need to rush something out.
Hopefully it all works out in the end!