Random hypothesising: I think UTAU would be the most popular vocal synthesis engine, if only it were slightly modernised.
The amount of support is insane, people pouring hundreds of hours into incredible voice banks covering by far the widest range of languages and voicetypes on any engine. And it's free: how do any alternatives even compete?
The reason seems to be simple, old-fashioned wonkiness. Having to change your locale to Japanese isn't hard, but it's an instant red flag that the software is outdated and not intended for Western use, which is more than enough to drive your casual software user back to its more professional-looking alternatives. And that's not even going into the early 2000s interface. Its many non-Japanese voices are tragically well made, but gather dust due to the technical challenge of using them.
So, the vast majority of people willing to jump through UTAU's hoops are those who want to use it's unique capabilities: namely the ability to create your own voicebank. But this means there's very few people that download UTAU primarily to use other people's voicebanks, resulting in ridiculously unbalanced supply and demand. Hundreds of people putting their heart into incredible voices, but not many people waiting to use them. Everyone's on the stage, no one's in the stands.
UTAU is like a jewellery store where talented craftspeople pour their souls into creating beautiful pieces, but the jewellery is thrown to customers in a paper bag. Despite it's quality, the store will have a shaky reputation, and passers-by will gravitate to options with more alluring presentation. If the vocal synth space ever gets a project like Blender, that's free, customer supported, and constantly updated: its will stomp the competition. But UTAU isn't that product — yet.