Is it possible that Yamaha might still be using concatenative synthesis, with Vocalochanger being the only AI element? Or perhaps they're training the AI off concatenative voicebanks, to keep that classic Vocaloid sound - similar to people who create 'AI' versions of concatenative voicebanks by running them through Diff-SVC. I mention this because, by default, the V6s still sound very much like concatenative voicebanks. Fuiro sounds exactly how I would expect Philo's voice to translate to concatenative synthesis, at least from what we've heard so far, and the English V6s have the same pronunciation quirks as previous English Vocaloids when not used with Vocalochanger.
Voice direction on an AI can definitely result in similar-sounding voices, but those voicebanks tend to sound like real people who happen to have similar timbres; whereas the V6s have synthetic similarities I remember from previous versions of the engine.
Voice direction on an AI can definitely result in similar-sounding voices, but those voicebanks tend to sound like real people who happen to have similar timbres; whereas the V6s have synthetic similarities I remember from previous versions of the engine.