My ideal vocal synth engine contains multiple engines or renderers for more or less clarity and more or less robotic output. It is able to work as a sampled-based synthesizer or as an analysis of a voice bank like the first version of Vocaloid. It works in any DAW with very low system requirements and can run on every major operating system. It allows users to create their own voice banks with a convenient UI for recording and processing samples. Its interface is like the current Piapro Studio.
It allows cross synthesis between every voice bank ever. It has tools allowing finding and replacing phonemes automatically to allow, for example, Japanese phonemes to be substituted with appropriate Spanish phonemes to quickly allow a Spanish VB to sing in Japanese without editing all the phonemes by hand. It is able to handle voice banks of every brand. It has extensive vibrato customization.
It has versatile and effective growl parameters allowing for grit in the voice, vocal fry, false chord screaming, and fry screaming based on analysis of the voice bank. It has a power parameter allowing soft voice banks to sound more powerful and powerful voice banks to sound softer. Perhaps most important of all, it includes a virtual model of the vocal tract and allows users to create new entirely synthetic voices by altering the virtual vocal tract, while also being capable of producing every human speech sound out of the box with no need for voice providers.
Oh, and that last part is not science fiction. Speech synthesis based on modeling of the vocal tract is already a thing. I don't know if anyone has tried getting such a program to sing yet, and in any case, it's cheaper and easier to just use samples. Plus I think fans of singing synthesizers and human singers will consider it sacrilege for a computer program to be able to sing anything in any voice, including their favorite Vocaloids or live singers, but people will get used to it.