I'll do my best to explain!
As context for anyone who doesn't know, traditional vocal synths are made using the 'concatenative' method. A singer is hired to sing all the individual letter sounds of a language, called phonemes, at various pitches. These short recordings of vowels and consonants are then downloaded onto your computer when you buy a concatenative Vocaloid. Then, the Vocaloid software puts the phoneme recordings together to make the words you type into the program. It is a simple method that allows you to make a program that can say any word without you needing to record every word in the dictionary.
AI vocal synths use machine learning to create the voicebank. Instead of recording phonemes, the singer sings whole songs. Then you label where each phoneme is in each song, so the software knows how to read them. After labelling, the AI software learns the sonic qualities that give the singer their unique timbre and pronunciation. These rules are saved into a much smaller file, then applied dynamically by the software, making the output sound like the singer. Rather than simply playing a pre-recorded sound, the software calculates how a word should be sung by applying the rules it has learnt from 'listening' to the singer. Rather like a robotic impressionist. There's a lot more complex stuff happening under the hood, for example, you can feed the AI data from multiple different singers to increase the range of one voice, and make them sing fluently in multiple languages by mixing the 'rules' for English with the 'rules' for a non-native singer's timbre.
However, AI does not sacrifice your ability to make a voice sound unique. In fact, I believe AI voices are easier to edit and can achieve greater variety in performance compared to concatenative voices, because you're not restricted to the singer's recordings. If you push their settings, you can make one AI voice sound like multiple completely different people! The reason AI usage online can be samey is a by-product of convenience. AI voices sound natural out of the box, so producers feel less of a need to edit the results. Consequently, a lot of producers use AI voices at their default settings.
Most concatenative voicebanks sound a lot less polished by default, and require more hard work to sound natural. When you tune a concatenative voice, you can hear a big improvement in its results, which motivates producers to edit them by hand. This forces more users to develop their own unique style.