Tutorial - Making a Vocal Synth PV Start to Finish

Introduction:
If making music is one black art associated with the vocal synth fandom, then making the PVs (music videos) that often accompany those songs is the other. This guide will take you through the process/software I used to make the music video for “Walk with Me.” It covers every piece of video-editing software I currently might use in my workflow, so it will hopefully provide a good starting reference for your projects.

Note: In keeping with its electronic/synthwave/chillwave genre, “Walk with Me” features animations (often called visualizations) that are synced to the beat of the music. These animations represent some elements of the music visually. The video for “Walk with Me” doesn’t include animated images of the vocal synths themselves. Much of the same software would be involved in making a video that included those animations, but you would also need appropriate software for drawing/animating the vocal synth characters.

Note: This guide features the software that I use, but there are many other options you may want to consider. See this thread for some others.

To give you a clear idea of what this guide produces, here is the “Walk with Me” PV:

Walk with Me / 共に歩む

mobius017
Mar 8, 2024

Walk with Me / 共に歩む [音楽・サウンド] "Walk with Me" / 「共に歩む」 by mobius017Words & Music / 単語と音楽...

Finishing the Song/Choosing Elements to Animate:
The first step in creating the music-synced PV was to finish my song. As I worked, I had ideas about which elements of the music (i.e., instruments/sounds) I wanted to represent as visualizations. I kept those ideas in mind and used them when it came time to make the animations.

Sourcing Background Images:
I wanted my PV to feature music visualizations over a background image. Piapro is a wonderful source of images (as well as other items) that members of the community who excel in particular areas (art, music, lyrics, MMD modeling) share for the use of other creators.

I ended up wanting to combine two images from different creators on Piapro. One image was an image of a cityscape/street at night. The other was an image of Hatsune Miku. Both had licenses that seemed like they would permit this kind of editing. However, I followed the advice of @MagicalMiku and messaged both of these artists on Piapro to check with them. In both messages, I

Listed both artists/artworks that I wanted to use
Described how I wanted to use/change the artwork of the artist I was contacting
Described what I planned to use the artwork for (in my case, the PV of a song, to include music-synced animations, to be posted on Nico Nico for the Vocacolle event and on SoundCloud)
Used as simple English as I could, and also included a Japanese translation
Included a simple mock-up image to give them an idea of the proposed combined artwork. In my case, this was a simple line sketch such as a child might do on a restaurant napkin, but it was enough to convey the idea unambiguously.

Piapro/Character Terms:
Works on Piapro are usable subject to terms that their creators can apply when they upload these works to make them available for others to use. The easiest of these terms to understand are indicated by way of little icons on the pages of the uploaded works: you’ll see things like a Yen symbol with a line through it, a pair of scissors, etc. It’s also possible for a creator to apply a custom license, in which case you must contact them before using the work and figure out the details of your use with them.

As a courtesy/requirement of using materials from Piapro, you are encouraged, as much as you can, to leave a comment on any Piapro works you use. In the comment, you let the creator know that you used their creation, thank them, and provide a link to your creation that uses theirs. In the description of your creation, you are often also required to reference the creator of the work you used and link back to them/their creation, although this can vary according to the license they applied. Be sure to keep these sorts of things in mind when posting your work.

This guide is not intended as a guide to licensing/legal use of items from Piapro, or the usage of the Piapro site in general. More details on that can be found on the Piapro site, usually in its help pages.

Bear in mind that there are often guidelines on the usage of vocal synth characters’ images/names/etc., too, which are often described in those characters’ End User License Agreements (which you’ll usually get when you purchase them or on their companies’ websites). For the Crypton vocal synths, you can generally find these terms in the EULAs and/or on Piapro/the pages to which it links. This guide is not intended as a guide to usage of vocal synth characters’ images/names/etc., either.

My apologies for not providing more detail on the items above, but it’s a somewhat large, detailed, and complex topic, and it is subject to change. I am also not a lawyer; I’m just a hobbyist who tries to parse these things as best they can.

Image Editing:
To make the PV, I had to digitally cut out the illustration of Miku and insert it into the cityscape image. I also adjusted the illustration’s color so that she appeared to be in shadow, since the cityscape was at night. I did these and a few other minor edits in Corel Photo-Paint, although you could also use other graphics software, such as Clip Studio Paint, Adobe PhotoShop, GIMP, or others.

Making the Music-Synced Animations (Visualizations):
To make the visualizations, I used software called Visual Synth by Imaginando.

VS is a VST that you can add to your song file. (Though it also comes with a standalone version that you can add audio/MIDI files to and run it outside of your DAW also.) VS can listen to MIDI channels 1-8 and to any audio that is audible to the bus channel to which you've added the VS VST (so I put VS onto the final Master bus where all the audio ends up). VS comes with a number of preset animations (which are GLSL shaders/fragment shaders), which you can use the incoming MIDI/audio information to control.

From VS’s perspective, MIDI basically provides 2 kinds of information: 1) the pitch of the note on the keyboard, and 2) the velocity of the note. Audio can be run through 1 of 4 audio filters, which basically detect if the volume of the audio goes above a certain level (in which case it triggers the VST to do something, the way the arrival of a MIDI note does). Optionally, the audio filters can listen only to a particular frequency range, so you can somewhat more finely control what sounds in the audio you're responding to.

As another source of "input," VS includes 4 LFOs. If you're not familiar, think of them as something that automatically draws a line on a piece of graph paper. The line provides a set of automatically generated values that we can do things with, because depending on how quickly the line is drawn and what (repeating) shape is drawn, the line will have a different value of Y from moment to moment.

Finally, there are 2 "envelopes" (similar to ADSR on instruments, although VS only uses attack and release) that get triggered on the arrival of each MIDI note, and you can choose to use either of them alongside the other sources, too. The envelopes basically serve to control how quickly VS's response to a MIDI note occurs/fades away.

Taken together, that all means that we have several potential sources of "input" in VS: 1) the audio volume (as filtered for relevancy by 4 listeners), 2) the MIDI pitch, 3) the MIDI velocity, and 4) 4 LFOs. There are also the 2 envelopes, which VS displays the same way as the other inputs, but which really are driven by the preceding information coming in from MIDI. Each of these sources basically output numbers.

The animations in VS are something called GLSL shaders/fragment shaders. They're basically little scripts that draw things on the screen. Being scripts, they can have input parameters that they take in and use to produce output (the drawings). So the programmatically-generated drawings can change depending on the input.

To match the input from the audio/MIDI to the input parameters of the shaders, VS provides a grid that matches the available input values to the input parameters that each shader you've chosen provides. In the grid, you provide yet another number, which I assume is a multiplier that lets you increase/decrease the amount of the input value that you pass to the shader.

Not all shaders offer the same parameters, although there is a maximum number of parameters they can offer (the number of parameters that will fit on the grid). The animation colors are something you can set with a color picker. I'm not sure if the color can be changed by shader parameters, but the brightness/saturation/alpha often can. One shader I looked at is coded to change its color automatically, so I guess it starts with the one you set and then changes, possibly using your color as a guide.

The number of presets VS comes with is on the low side, in my opinion, but they’re GLSL shaders/fragment shaders, and you can find additional ones on various websites. Fortunately, I didn’t have to do that.

Aside from the shaders, you can also add a background image or video into VS, over the top of which the shaders will be displayed. This is where I put the composite background image of Miku/the cityscape.

I used VS in a separate Studio One project dedicated to making the video. This was primarily because Studio One doesn't do MIDI routing (the way most other DAWs do, it seems), so it basically wasn't possible to share the existing MIDI data between the instruments doing the song and VS. I ended up exporting the MIDI and selected audio tracks that I wanted to animate into a separate project. That ended up being a good thing, because one of the advantages of using VS in a DAW is that you can use the DAW automation features to change some of the params of VS from the outside, and it's nice to keep those additional video-exclusive automations away from the stuff doing the music (which is sometimes complex enough already). I used the DAW automations to do things like fade the animations in/out and turn them on/off.

Recording the Visualizations:
Unfortunately, VS's video rendering does some weird thing where the length of the video doesn't match the literal length of time that the song lasts—the video literally seems to have time running faster/slower than it should. (I don't think it's anything I did; there's literally a note in a recent release note for VS that says they made the rendered time more consistent/probably match what you recorded better....)

To get around this problem, I went the old-school route (from before Imaginando added VS’s video rendering feature) and downloaded Open Broadcaster Software (OBS), which is free video capture software that YouTubers and VJs use. It was free and easy to download/set up. So I wound up having to screen-record VS (in its fullscreen mode) from inside Studio One.

Here are the steps I used to do that:

Open the Studio One video rendering file
The Studio One track should have maybe 3 bars of lead time before the start of the track. This provides time for getting the screen capture recording going.
The Studio One track should have some automation to indicate when the track starts and ends. I personally flipped the background image upside down during the lead time and after the song was over. This lets you know when to crop the video to after it's been recorded.
Check any VS settings that might be automated that start at a random setting. Ex., I had a preset that (by design) rotated and started at a random position. I didn't want it to move, so I turned the rotation off. The start point was still random, so I had to set its starting position before starting recording.
Put Studio One’s time cursor at the beginning of the song.
Open the full screen view of VS. Keep it in a non-maximized, but also non-minimized, state. This is because the full screen view is set to appear over the top of every other window.
Open OBS. Choose your VS window capture scene (I'm assuming you've set this up already; it's self-explanatory, just follow the few steps on the OBS website. Be sure to uncheck the option that will include your mouse cursor in the recording.).
Other OBS settings of note are to render to .mp4 and to set the frame rate to 30 fps (which is what my VS was producing). Also be sure to mute your desktop and mic audio in OBS, since you probably want a silent video.
Having too many graphically intensive windows open on the desktop may overwhelm your PC's graphics processing. To help avoid that, you want to minimize any windows you're not going to use so that they're not rendered at all. Accordingly, the process to record is as follows:
Start recording in OBS.
Minimize OBS.
Start playback in Studio One.
Minimize Studio One.
Maximize the full screen VS window.
Wait for the playback to complete and your automated indicator of playback completion to be shown in the VS full screen window.
Minimize the VS full screen window.
Stop playback in Studio One.
Stop recording in OBS.
The recording will appear in your Documents > Videos folder. A message about where the video was recorded will appear on the lower left in OBS, and this location can also be changed in settings.

Adding the Music:
At this point, I had the following:

A .wav audio file of my song
A silent video (.mp4) containing the background image of Miku/the cityscape with the visualizations over the top of it. Since it was recorded manually, the video had a little extra footage before/after the intended material.

I added these components into Corel VideoStudio, where I trimmed off the excess video material and rendered a new video that included the audio.

Creating Subtitles:
Subtitles can be added in Corel VideoStudio, as well as probably many other video-editing software products. However, I often use the free Aegisub. There are a few reasons for this:

Splitting the subtitles out into their own file keeps them contained and reusable in other programs.
Once subtitles are added into Corel VideoStudio, they’re part of that file, and VideoStudio makes them somewhat hard to edit as a group after the fact.
Aegisub has some formatting options that Corel doesn’t have.
You can define styles (e.g, font size, color) in Aegisub and reuse them in multiple projects.
Text can be easily positioned as a group in different places in Aegisub. In Corel, I believe you would have to move each line of subtitle manually.

Aegisub supports a few different subtitle file formats. I personally use its proprietary .ass file format, because the VLC media player understands it. A .ass file can contain the subtitle text, style formatting, and subtitle time codes. However, Aegisub can also use other common subtitle file formats, including the common .srt—those formats just may have different data in them/be supported differently by other applications.

Adding Subtitles to the Video:
Subtitles can be added to the video using the free VLC Media Player. In the top menu, go to Media and choose Stream. From here, you can choose to stream the video you have so far while also selecting the subtitle file to use at the same time. You’re also given the chance to make the appropriate settings to make the final output video compatible with the destination platform (e.g., Nico Nico, YouTube, etc.)—you’ll need to look up what settings that platform wants you to use (e.g., file format,etc.) and set them, but once that’s done, you can save it so those settings can be used again.

The output of this process will be a new video on your hard drive. The subtitles will be part of the video itself in the same way that the animations are. Subtitles like this are often called hardsubs.

If you prefer subtitles that are more separate from the video (like the automatically-generated ones on YouTube), you could add the subtitles differently—rather than streaming through VLC to re-record the video, I believe you could simply provide a subtitle file of appropriate format to YouTube and let YouTube render them, for example.

Reactions: richaux

Search

Search

Tutorial Making a Vocal Synth PV Start to Finish

Walk with Me / 共に歩む

More resources from mobius017

Share this resource