• We're currently having issues with our e-mail system. Anything requiring e-mail validation (2FA, forgotten passwords, etc.) requires to be changed manually at the moment. Please reach out via the Contact Us form if you require any assistance.

Tutorial The Introduction to Sound and Mixing Thread

Mar 6, 2018
25
⬞▹ Honestly, this is my second time typing this all out and it hasn’t been all that fun finding time to get around to working on it the second time. I was too stupid to save a post as big as this outside of drafts here, but it won’t (didn’t) happen again. The time spent making this has been off and on. I’ve gotten tired of it some days and although I don’t work on it 24/7 or anything like that, I still wanted to try to get it done because I feel there’s so much misinformation out there and I hope it becomes helpful at least to those beginners who have sent many messages asking for help.
⬞▹ The purpose of this thread is to try, not to explain the “what”, but the “why”. Too often I’ve seen blind advice on mixing or mastering given, yet people don’t understand why it is they’re doing it. You’ll never get much better just tackling things from a surface level, and once you understand “why” things are done rather than “what” things people do, you’ll be able to determine what directions you want to take for mixing and mastering your music - only you will know where it is you want to be going. If there’s anything you don’t agree with in wording, want added or don’t get, just leave a response (although I can’t guarantee I’ll get to it right away).

⬞▹ In music, mixing essentially refers to exactly what the word itself normally means for anything else. In a composition, there are multiple sounds involved that have to be mixed together. Putting instrumental tracks of, say, a bass and a guitar together can already be technically considered mixing. Simple enough on the surface, right?
⬞▹ Well, much like when cooking, you can’t just throw random things together into a blender and expect them to work well without putting some thought into it. Making a good mix is all about getting the sounds you use to not only work well together, but to compliment each other and to give the overall arrangement a more full sound without phasing issues and without frequencies fighting over space.
⬞▹ A good mix is about clarity, and sometimes stereo and depth consistency, but not loudness. You should be able to hear individual parts coming through, regardless of if the volume is low (although the volume of your instruments and whatnot relative to each other should be appropriate regardless, of course). The main concern when mixing should be what frequencies are being used and not how loud the track is, or even the loudness of individual frequencies, so it’s good to focus mainly on decreasing the volume of frequencies in conflict with EQ rather than increasing volume with EQ. Because of this, it doesn’t matter what speakers you use to hear when mixing, as long as you can hear 20-20kHz through it and are mixing in mono to avoid stereo phasing, the goal should, again, be clarity.
⬞▹ In short, mixing can be seen as anything done to individual tracks in a composition to help them sound clear and to keep their frequencies from conflicting with each other. The “right volume” for tracks when mixing should only be relative to other tracks in the mix and depends on what you want. If you need to bring up the volume of a track for it to be heard over something else, it means the frequencies are already in conflict and it should only be done in as minimal of a way possible. At the same time, instruments shouldn’t have frequencies cut off in a way that makes them sound unnatural, so EQ wisely.

⬞▹ This mysterious concept with only seas of vague explanations is also generally considered to be the final and most crucial part of production. Although many people seem to draw the line between mixing and mastering in different areas while other people will still even argue on what exactly mastering is, all you need to know about it in my opinion are the main things that define it at its core.
⬞▹ Hopefully you already know this, but in mixers, there is a main channel which can be used to apply effects to all the other tracks together, each of them running through it by default and needing to go through it to be heard. This channel is called the master channel. Making changes through this master channel is what can be considered mastering, whereas mixing is about the individual tracks on their own. Just like with mixing, although you might be tempted to throw all kinds of things into the master channel and call it mastering, getting it to sound good requires a bit more attention to detail.
⬞▹ Unlike mixing, mastering is all about volume. Some people like to add some slight effects like reverb, and although that can make everything sound more consistent at times, I’m personally against doing those kinds of changes to all the sound together unless you have to do changes to a song without its stems (or individual tracks/parts) - and even in that case, you’ll want to split frequencies and re-layer in a way that brings you back to the mixing process to avoid things like potential lower phasing issues. The only effects you really need to focus on using in the master track are EQ and compression, and if and when you’re using compression on a master channel, it should almost always go after the EQ.
⬞▹ Again unlike mixing, having a good and probably pricey set of speakers/monitors/headphones that are as close to flat in EQ as you can get them can make a big difference because here is where we go in and adjust the master EQ. What you should want is to give the master an even sound when compared to other songs by other people and to sound right when coming out of the speakers that other people use. It’s not completely necessary to have great pricey speakers, but I would recommend you have at least one good set, along with the lower-quality expected output of your target audience to listen through.
⬞▹ Regardless of if you mixed with cheap equipment or not, the first step to mastering after making sure the mix isn’t clipping is making sure all the frequencies are at a good volume with good equipment - or in other words, EQ. The next steps depend on what your music is going to be for, but usually you’ll want to try hearing it through the other things like those cheap earbuds or car stereos depending on your expected audience. The main goal here is getting it so that the track’s EQ is relatively standard to the average intended output, although it can also be dependent on other tracks in a group or playlist with volumes relative to that.

⬞▹ You probably know sounds are caused by vibrations which commonly can be represented by two dimensional waveforms, right? With that, logically you can come to the conclusion that the waveform must always be in motion to be heard, and that is true, but we have to go a little further to really understand what sound is. To elaborate, instead of thinking of sound as something like a frame in a video or as the images in a filmstrip, comparing each waveform to an image ready to be projected, think of it as more like how an electron beam is in CRT monitors - a light focused on a single point moving from one side of the screen to another, forming countless rows of pixels in different colors to ultimately form the image you see within the blink of an eye. Another way to think of sound is like the needle itself drawing out a polygraph, but moving many, many times faster.
⬞▹ In reality, it’s pressure and vibrations in 3 dimensions, but in a way that it’s always in a single limited position for each tiny moment in time, constantly moving. Those changes in position and that movement in time itself is what allows us to perceive sound; to hear it. Each small section of a waveform like a sine is made up of seemingly infinite micro-positions constantly in motion in the real world from the vibration of matter. The idea of hearing multiple sounds at a time is more so a sort of misleading generalization, while even a single pitch from a sine is just the result of something moving frequently from one position to another at an even pace.
⬞▹ Unlike in the digital world, in the natural/analog world, a waveform can’t easily be forced to stay still or to move irregularly in ways such as suddenly going downward or staying still from distortion. Waveforms we see in two dimensional images are just a concept after all to help us visualize these vibrations in limited digital space - you’ll never find something like a square or flat lines (aside from those in the center to represent the lack of sound) in the real world outside of a lab setting or something, even if you record a sound that has gone through a distortion. If you play a sound represented by a square wave from your computer and record it with a mic, you’ll see the recorded sound looks nothing like a square unless your mic is clipping it. This is because these waveforms use phasing beyond our hearing range to get their compressed shape, as well as beyond the range our speakers are meant to reproduce in the real world.

⬞▹ Digital audio, to put short, is like hypothetical sound that doesn’t really exist yet. The waveforms you see on your computer screen are simply a representation of something for hardware like your speakers to attempt to reproduce. Although I did say a waveform shouldn’t be compared to something like a filmstrip, for the sake of simplicity, the limitations of audio in the digital world can still be compared to how we use “frames per second” for video since we can’t make smooth calculations for potentially infinite moments in time. Like framerate in video, digital audio has both a samplerate and bit depth.
⬞▹ Remember how above in “Understanding Sound”, I mentioned that a waveform can only be in one position for each moment in time? Well, to try to explain bit depth I’m going to need you to imagine a single small waveform, let’s say a sine for example, but it can only exist in a limited number of flat positions. If it could only exist in three positions (1 bit), it would now look something like two square waves stacked on eachother. As we increase bit depth, the number of positions/values the waveform can exist in doubles, and once we reach 16 bits it can become indistinguishably smooth - even if it’s still made up of only a limited number of flat positions/values.
⬞▹ If the sine wave is meant to be loud, it would also use more positions and thus become more square-like because the waveform has to be able to reach higher and lower ends for volume - while if it’s quiet, it’ll need to use less positions, making it seem slightly less square-like. Since these positions are divided with volume, some lost quality can also become apparent when music is played loud enough. This is why although 16 bit might seem good enough, 24 bit is preferred.
⬞▹ Samplerate on the other hand is essentially the speed a waveform can move from one of the positions/values mentioned above, to the next. If the samplerate of a sound is high, it’ll be able to move quicker from one position to the next. Quicker movements between positions of a waveform mean potential for sharper changes in the waveform, and sharper changes in a waveform mean potential higher frequency content.
⬞▹ This would also mean that sample rate can also effect the maximum potential frequency a waveform can have. To put simply, you can’t have higher frequency sounds with low samplerates. With that said, imagine another sine - since sound only exists in one position at a given moment in time, a simple sine is all that’s needed in this example still. The highest pitch humans can here is considered to be around 20,000Hz, and the higher a pitch is, the more quickly a waveform is changing positions. Since a waveform has two peaks, one at the top and one on the bottom, the sine needs to be sampled twice - once for the values of each peak. This is why a samplerate of 40,000Hz, or cycles per second, would be needed in order to accurately reproduce a 20,000Hz sine wave. Because of this, in addition to a small boost for technical reasons that are unnecessary to go into here, the best samplerate to use when producing is considered to be 44.1kHz.

⬞▹ Distortion on its own isn’t a bad thing. It can be used to get really amazing sounds, but when a waveform is being chopped off on a finished track, that’s pretty much never beneficial or something many would ever consider good. There are many kinds of distortion you’ll hear about - destortion as an effect, waveshaping, bitcrushing, clipping, soft clipping, hard clipping, digital clipping, amplifier clipping, etc. but the main thing you need to know is to not let a waveform keep a flat (or “clipped”) look anywhere in a final version of a music track you’re producing.
⬞▹ Distortion as an effect, on the other hand, is not going to keep the waveform clipped or distorted looking by the end of the mix - or at least, it shouldn’t. After applying distortion, you’re going to be left with frequencies you generally don’t want, and so you’ll use other effects afterwards such as EQ at least and reverb. For guitars, a variety of effects such as chorus and flanger are usually used after distortion, while without them, it may sound very unpleasant to just hear them with nothing but distortion.
⬞▹ Clipping is just a form of distortion and is the word more commonly used to refer to post-production distortion - however, “distortion as an effect” is really just clipping with effects usually used to clean it up in production. What clipping does is not only create frequencies below and above our hearing range which use phasing and high end harmonics, but noise all throughout the spectrum to create the unnaturally shaped “clipped” waveforms. These frequencies you most likely won’t even notice should be cleaned up and removed with EQ since they can not only cause problems for the mix, but for hardware as well when being played. Since the flat part can’t be played in the real world, an approximation is played based on the frequencies in the playable range of the speaker which is louder than the waveform can represent. Even sounds such as square waves, like mentioned above in “Understanding Sound”, should not be left with their square shape - try removing the low, seemingly unused frequencies of a square, then watch as the shape changes to something that doesn’t look clipped anymore while keeping the same sound (you can also consider this as proof that you can’t just rely on only your ears when it comes to music production).
⬞▹ When you clean up these sounds with EQ, you may notice they now take up more volume while sounding just as loud as when they were clipped, when they were compressed with more frequencies, but this is part of why it can be potentially harmful. The sound truly is louder than it’s supposed to be when it’s clipping, and more energy is required to have the hardware give the output it needs to when reproducing a clipped waveform. Speakers will have a limit on how loud they can play something, and if it’s already at full volume while trying to play something clipped, it can be going way over its limit.
⬞▹ Now that we know the difference between distortion as an effect and clipping on a final track render, I want to first make it clear that having your rendered and “finished” track clipping is bad. It’s not good at all, despite the fact that some people will argue it’s okay to have clipping - artistic liberty has no place with this as it’s more along the lines of math. It’s objectively bad for a multitude of reasons, even if the kind of music you want to make is extreme digital hardcore or gorenoise. Put simply, there’s just no good reason to purposely allow your music to clip - all it does is take away parts of your track in a way that’s irrecoverable and can bring about random issues, even causing potential interference and harsh dissonance with random frequencies.
⬞▹ Moving on, now we can talk about direct and alternating current, but mainly direct current. Direct current (DC) is basically when speakers are trying to reproduce the flat part of a waveform, whereas alternating current (AC) is when the current is alternating from a waveform having the usual parts that go up and down without flat horizontal lines. The only exception to this is when the flat line is from silence with no DC offset. What is DC offset? Imagine a horizontal line going through the center of a waveform - this is what’s known as the zero crossing, which is that imaginary line you could say the waveform sort of wraps itself around. DC offset is the mean amplitude displacement of the zero crossing from the center of where the zero crossing should normally be, and although technically it’s always changing as a wave moves, it requires more unnecessary energy to have the center position of a wave anywhere that isn’t where the center should normally be. Another negative from having a DC offset of anything that isn’t 0 is that, if for example the DC offset is pushing the wave all the way up, the top of the wave will be clipped with only the bottom part left non-clipped - and even while at a low volume, it’ll be forcing the output hardware to work just about as hard as though the sound were being played at nearly max volume.
⬞▹ And then there are zero crossing shifts, to describe an absolute sudden vertical shift in a waveform or its zero crossing. This sudden change in a waveform can cause pops and clicks with it also being something you could consider as a sort of distortion. You may notice this most commonly if an audio track is suddenly stopped in some DAWs without a proper transition to where the zero crossing should normally be, usually from a sample being cut improperly or in a sudden change between two sounds without transition and without the waveforms actually connecting to eachother. This is usually done automatically in most DAWs I think (and especially in media players of course), but it’s worth keeping in mind - especially if you do editing in things like audacity or programs meant for things like video editing.
⬞▹ Anyways, getting back to clipping, all of those things going on with the waveform itself are just what people would refer to as digital clipping. Digital clipping or direct current on its own isn’t always going to cause damage to hardware strictly speaking, but it does of course open up a door for the potential to cause damage and will require more work while generating more heat. At this point it can be called amp or speaker clipping, and once you hear it, usually from using high volumes, there’s already damage being done. This, of course, doesn’t mean playing Game Boy music relatively quietly through your car speakers will fry them - but that it’ll be many times more likely to at higher volumes than any other music that’s been properly produced without things like squares. To put simply, playing music at half max volume that doesn’t look like a brick wall and isn’t clipping is like pushing against a wall with a Socker Bopper on your hand, while loudly playing something clipping is like punching your wall with brass knuckles - some walls are made thicker or thinner and anything can cause damage if you’re hitting hard enough, but there’s a much higher chance you’ll do damage if you’re hitting hard with the latter, how hard you’re hitting being metaphorical to volume.

⬞▹ Interference is basically the way two different waves interact with eachother in their limited space, in the theoretical digital space or otherwise. Constructive interference is when two waves of the same kind come together and amplify eachother because they have the same positive and negative peaks. Destructive interference is when the two waves of the same kind cancel each other out because their positive and negative peaks are on opposite ends. This is one of the reasons why we want to avoid tracks having to use the same interfering frequencies when mixing - we don’t know where the positive and negative peaks are all the time for every instrument recording, but there are still a few more things to keep in mind than just that.
⬞▹ You probably already knew about these kinds of interference, but have you taken into consideration what happens when two sounds with the same waves are played together at once with different intervals? That’s right - because the area of the positive and negative peaks are constantly changing as they oscillate, their relative positions basically depend on when the sound starts playing. You may think having two of the same kind of sound play at once will just amplify it, but they could very well cancel each other out if played with just the right timing difference in limited space.
⬞▹ It’s easy to just decide to not use overlapping notes in the same frequencies too, but you still have to take your filters and effects into consideration (or even recording room). As long as you’re using just about any form of reverb, delay, chorus, flanger, etc., you’re going to have that interference. If you’ve used delay before, it shouldn’t be hard to notice that it works by taking the sound and having it play again and again with decreasing loudness, usually while panned around; reverb works in just about the same way but more saturated in more copied sounds. The dynamics given to sounds caused by the phasing from reverb aren’t necessarily a bad thing to completely avoid, because they lead to a more natural (although less stable and consistent) sound. All that matters is that these sounds with more reverb or delay are left quieter to keep headroom in the mix and so that they don’t randomly clip. Some sounds with strong reverb can double in loudness randomly, or even become that much more quiet when put into mono, so be cautious about that.
⬞▹ As for bass and sub when it comes to things like reverb, it wouldn’t be too bad to have it on those lower frequencies if the only consequences were a somewhat fluctuating volume, but that’s only the case if you’re only using a single frequency without other pitches or bends or changes in frequency. The problem here is that anything with reverb over it leaves a sort of trail of the previous frequencies played, potentially even causing dissonance. For most kick drums, you’d usually have a higher pitch sliding down to a lower pitch, reverb giving it conflict with all those frequencies in between - but it’s not just about dissonance. Let’s say you don’t mind that - the frequencies in between can be much more destructive than just having interference from sounds trying to share the same frequency.
⬞▹ Similar to the other effects like chorus, flanger and even some forms of delay that copy sound into different pitches, these might seem like they shouldn’t cause the problems mentioned above with reverb and regular delay because they don’t copy the sound into the same pitch. But now, why is it possibly worse if slightly different frequencies are being used at once? They’re not taking up the same space anymore, so you might be thinking they shouldn’t have any of those issues with interference, but it’s still a bit more complicated than that. Remember, all sound is the same sound and can only be in one position at a time. When you play two notes at once, the wave is going from one frequency to the other at a very high speed. This interaction between two notes in one waveform causes a phenomenon known as phasing, and when it occurs there’s a relatively short period in time in which no sound can be heard due to the wave canceling itself out while transitioning between frequencies.
⬞▹ That short period in time depends on how fast the phasing is happening, and that depends on both how high or low the frequencies are, as well as the distance between the two frequencies causing the phasing. Normally, the speed of transition between two full notes is too fast for us to really notice, but if you use sines at low frequencies and put them only a note apart, you’ll likely start to notice they shortly cancel each other out. The same can happen with higher frequencies if you detune one frequency from another by microtones - it should seem almost like the volume is going in and out. As the distance between the frequencies increases, the speed of this apparent volume change will also increase until finally it’s unnoticeable and the two tones simply sound like they’re both able to exist at once, together. This is referred to as “beats”, not to be confused with “beats” in the context of “I’m making some sick beats”, but that’s not too important.
⬞▹ It’s more common to find these “beats” in lower frequencies when using effects like the ones mentioned above because there are less frequencies in the lower ranges than in the higher ranges. In the sub range, you have around 20 - 100Hz out of 2,000Hz - the number of frequencies in the spectrum increasing exponentially as you go from low to high. The difference in frequencies, or space you should want to keep between two frequencies being used, is around 30Hz. Two frequencies with less than that much space can usually be heard interfering with each other, which is why those kind of effects, and even use of harmonics in the sub-bass area is generally considered a bad idea.
⬞▹ Finally, since sound waves can still interfere with each other in the real world after coming out of speakers, along with the fact that a lot of people just plain still use mono output or only have a speaker in mono like how phones are, it’s best to do mixing in mono so that you can hear all these potential causes of interference.

⬞▹ We covered above why low frequency sounds generally shouldn’t have effects like chorus and reverb, especially the lower you go, but now that brings us to potential conflicts with other sounds that might use lower frequencies. Most music has a bass track and percussion that involves a bass drum sound, and this is a common area where people new to mixing have trouble with.
⬞▹ As a quick reminder for two important things to keep in mind from earlier - a difference of at least around 30Hz should be kept between sounds, and there are less frequencies in the lower range which means there’s less room to work with and it’s easier for the sounds to interfere. When working with a melodic bass sound and a kick drum, you have to decide what method or method combinations you’ll use to keep their frequencies separated.
1. You can separate the frequencies by having the bass drum use the sub-bass and the melodic bass sound use the... well, bass sound. This is something usually done by default along with other methods below, but in electronic music meant to be strong in the sub-bass with synths it can be lacking. If you’re doing an electronic genre with extremely frequent kick drums while the track is meant to have more focus on a melodic sub (pretty much just speedcore sub-genres?), I guess having the bass drum use the higher bass and the synth use the sub-bass can be done as well, although it’s usually pretty uncommon.
2. Going back to the problem mentioned above with bassy electronic music, you can have the melodic bass and kick drum use both the sub frequencies and the higher bass frequencies by ducking the melodic bass sound under the kick with side-chaining whenever the kick drum comes in. In this case though, if the kick is frequent, you won’t hear the melodic sub-bass sound too often, so it’s better to have the strength and decay of the ducking depend on how infrequent the kick is. This can also be done when the sub frequencies aren’t being shared in combination with the first thing, but just lightly, giving the kicks a stronger focus. Just don’t overdo it - less of this is better for a more realistic sound.
3. If the kick is too frequent to side-chain to the point where it’ll just keep out any other sub sounds, while at the same time you want there to be a melodic sub, another option is a finely tuned sine for the sub-bass. This is the least common option and can take away some potential impact of the kick drum, more commonly done in older electronic music. In more detail, it’s basically just a sub-bass sine that imitates everything the melodic bass is supposed to do, while all the kicks are also imitated with it and pitched to the same notes the melodic bass would be.
⬞▹ Note that these things can also be applied to snares or other percussive sounds if they also are meant to use lower frequencies, or frequencies within 30Hz of one of the other lower frequency sounds you’re using, of course.

⬞▹ Bass, percussion, melody, harmony, ambience, all things that make a track feel complete - and they must all be mixed together at separate frequencies to not step on eachother. I think the best place to start is by having an idea of how full you want your track to be and what kind of sound you’re going for, then deciding for each sound how many strong frequencies in the sound you want to focus on. Before that, just get rid of all the unused frequencies. If you’re only going to have something simple like some vocals and a piano, you could probably let your sounds keep more width - but if you want to have a track that’s going to be using a ton of sounds, you’ll want your individual sounds to use less frequencies. After boosting those frequencies that you want over others, you can lower the volume for areas in the sound that are weaker which you either want less focus on, or areas that lack meaningful activity that adds to a better sound more. If the sound has bass or sub activity, you should split the sound by its frequencies and route them to separate mixer inserts to treat those frequencies differently when applying further effects such as reverb afterwards.
⬞▹ The best sounds to start with should be the sounds you want more focus to be on. In my opinion, that would be your lead sounds (although some people start with their percussion/”beat”) - and at all times in your track there will always be one “lead”, even if you don’t consider it to be that. If you have an arpeggio in the background and the main lead stops, that arpeggio can be considered your new lead with different EQ at that point in mixing to have the most clarity over anything else. Along with that, since anything melodic can technically be in any frequency or pitch if it’s a synth, you should treat anything with a part going into a different octave as a different sound altogether when changing its EQ.
⬞▹ After leads, you’ll usually want to work on the percussion EQ based on frequency ranges going from either highest, then lowest, then second highest, etc. or starting from low depending on if the lead sound is higher range or lower range. There is no solid cut-off line that’s best, but it’s safe enough to say there generally shouldn’t be more than one sound at a time using frequencies above 11-13kHz. Some crossover isn’t too bad in this range, but 14-16kHz should be reserved for only one sound in the mix that benefits from it most, such as hi-hats. Anything above 17-18kHz is usually unheard (humans can hear up to 20kHz, but that usually only applies to newborns) and can be ignored, later possibly removed entirely to help save space. If you’re dong something loud and heavy, it’s usually fine enough to even remove sound above 16kHz, while if you’re making something classical-like and dynamic with clean and real instruments you might want to leave frequencies at least up to 18kHz (as long as you’re not rendering it to .mp3).
⬞▹ In the high range, areas around 4-8kHz, 10-12kHz and everything above 15kHz can be some of the harshest sounding areas to our ears and should be given extra attention to not be too loud. Vocals with sounds like the “S” sound are especially infamous for causing unwanted sounds at around 4-8kHz, but we’ll get back to specifically vocal related issues in the next section. With that out of the way for hi-hats though, we can carefully move on to the next percussion sounds that use high range which are sounds like claps and snares.
⬞▹ The high end here is a bit lower than hi-hats and will usually contain frequencies from the lead sound - we want to avoid interference if we can, while also getting the sound’s strongest frequencies. The biggest difference between things like snares and claps is that a snare sound will have lower frequency “punch” to it, sometimes in the higher bass, sometimes in the low mids. If there aren’t frequencies that will match up well with enough punch, you can always also try layering by taking multiple samples, then using some frequencies from one sample and the rest of the frequencies from another sample. Layering is a technique on its own which can be used for many sounds that are meant to use multiple ranges of frequencies to get a more full sound.
⬞▹ Around 250-2,000Hz is what would be considered the mid-range area, with 250-500 being the low mids - an area that can highlight the atmospheric and rhythmic qualities of a track. Although in constant use it has the atmospheric quality, it’s also where some weaker snares will land in their “punch”, as mentioned in the last paragraph. The lower mids can make something sound underwater-like, cardboard-like or muddy if there’s too much, and on the other end, having less of it can make something sound more up-close and clear - unnatural, flat and artificial sounding though if there’s not enough while in the wrong context. Some music can actually benefit from having almost no low mids, but the best thing I think to do is to first take into consideration what instruments or sounds you have that can be heard in this frequency range (ambient or rhythm focused sounds being the best). Take out some of the frequencies for instruments that don’t really have focus in the area, and then bring the rest up or down depending on what kind of feel you’re going for.
⬞▹ Last is the rest of the mid range (ignoring sub and bass since that has its own section) which has a little bit of just about every sound and is extra important for melodic sounds. There will probably be a bit of percussion left in the mids, so just give a bit of boost to the parts you think are best here working around the lead, using the more spacious frequency areas for what’s left to add in. By now, if you haven’t done the bass/sub sounds I’d suggest you do that next. After which is just whatever backing melody you have that compliments the lead harmonically, and then the main chord progression’s rhythm sound or other form of harmony that would be more in the lower mids (see the previous paragraph about mids). Non-noisy melodic sounds aren’t too much trouble mixing together compared to percussion mixing which can be like finding the sweet spot in noise at times, so long as there aren’t sounds using the same notes as other sounds in the same octave and you’re listening out for at least the highest note used, lowest note used, and most common frequencies of the melodic sounds.
⬞▹ If there are two instruments or sounds using the same notes, try to decrease the volume at the frequencies of the specific notes in conflict from the less important or non-lead sound as much as you can without it sounding too unnatural for just those moments. If there’s a wide enough range of frequencies being used with the notes being mostly the same, refer to the end of paragraph 5 earlier in this section on layering. With that out of the way, you should now have all the main things a track would have - melody, harmony, rhythm, bass and percussion.

⬞▹ Let me start out by saying that there isn’t always consistency in range and frequencies used for vocals, unlike any other instrument. Vocals can be a low bass, they can be way in the highs, and there’s no chart or preset that can tell you what frequencies you should focus on because every voice is different. A lot of voices can have problems at varying harsh high frequencies with sounds like the S sound, and most strong melody is going to be in the mids (as is with just about everything), but that’s about it. At the same time, vocals are always a lead unless they’re being used for something like background effects (no coherent lyrics and simply sounds such as in many EDM tracks) or unless they’re the harmony for other vocals which are already the lead sound.
⬞▹ Sure, there are complex synth sounds and instruments, but vocals add meaning through phonemes which are not only complex, but also need to all be clear and heard in order to be understood. If they can’t be understood, it can cause frustration to the listener or confusion. This is why vocals need a special spotlight when it comes to both being a lead and when being mixed - they generally need a significantly higher amount of detail to come through than regular instruments and synths that can get by just by being able to hear their melody at times.
⬞▹ The first step to getting a good sounding vocal track is getting it to sound good right from the start, before having to add filters and EQ. If the raw quality isn’t good and you really have no other options, you can always build around it by going with an intentionally low quality sound for the rest of the track. Doing so, however, only leaves you with less options and potential clarity. Get the best quality you can to start with, keep retrying at lower volume if you’re getting things like distortion, turn off things like fans if you’re recording, take into consideration room reverb and get it all as close to perfect as possible without autotune in as many takes as it may take.
⬞▹ For Vocaloid, using things like autotune should only be done for very minor details such as maybe changing some depth of vibrato here and there, but you should do as much as you can only in the editor without effects when it comes to all the articulations/”parameters” aside from dynamics which can be added after removing the raw file’s main issues in a DAW. Using something like melodyne may seem like a good idea at first, but it’ll only smear the qualities of the voice in an unnecessary way when Vocaloid exports the vocal in key anyways while already allowing you to edit anything to do with the pitch without potential added issues (and yes, people have tried “correcting” pitch in things like melodyne).
⬞▹ For autotune real vocals, I would recommend that you, again, first get the closest to perfect vocals recorded through various recording sessions. Afterwards, you can then edit them together so that it only uses the best parts from the different recordings. Then you can add only what’s needed for your vocals to at least be mostly on key. This is only a recommendation for people new to using it though, so do what you feel comfortable doing if you’re more experienced and are able to recognize its usage - and don’t think people won’t notice you’re using it when you’re using more than what’s needed.
⬞▹ There are a handful of ways to deal with relatively quiet unwanted sounds, but the best way in my opinion is the most obvious way - to just cut out all the parts in a track that are meant to be silent and to remove the frequencies lower than and higher than what’s needed. Don’t forget to fade in and out when trimming and leave some EQ curves to keep it from being unnatural sounding. If you’re using Vocaloid, render the vocal track to have “dynamics” set to an unchanging volume that keeps it at about at least half full volume - this way you don’t have to bring up the volume too much when editing. For anything, having to bring up volume from extremely quiet to loud can cause issues in a similar way to how zooming into an image leaves you more so noticing individual pixels (see “Digital Audio VS the Real World”, paragraph 3).
⬞▹ From there, you’ll most likely come across higher frequencies that sound harsh at times. Do not use compression or limiting yet since it may end up only making the harsh frequencies sound more noticeable while drowning out everything else. Instead, what we’ll be doing here is called “de-essing”. There are many free VSTs for this you can find with a simple search, but I’m going to attempt to explain how to do it without those with EQ and sidechaining since it’s best to know a general idea on how it can be done. First, you should route the audio file to a channel for effects, or mixer insert, then route that to two other inserts - one of those being silenced by the end or disconnected from the master insert. For the one that’s still going to the master insert, give it a narrow peaking EQ band and leave its volume alone to the center - then for the one that isn’t going to the master insert, take out all the frequencies except a very narrow area that has the most problematic frequencies. From there, add a peak controller to it which you can mess around with later to get it to sound just right after linking with an inverted relationship to the narrow EQ band in the EQ of the other insert. This will cause the volume for those specific frequencies to be limited based on how loud that area gets since you don’t want to get rid of them altogether. If you want, you can later also bring the mix level in and out based on when you feel the de-essing is needed.
⬞▹ Now, after we’ve applied EQ, we can finally get to compressing and raising the volume to a good level. When compressing, you should only limit the volume to be just slightly over the average so that any parts that are too loud will instead stay at the intended volume. If you’re using a real recorded voice, I suggest compressing differently based on sections that may have different intended levels since you can’t have the raw recording at a consistent volume like you can with an exported Vocaloid track. For Vocaloid, after all of this you can add in dynamics throughout the track from in your DAW. Now all that’s left is adding effects like reverb, which should probably be done by linking to another insert for organization and room.
⬞▹ Keep in mind though that when doing any of these things involving EQ and compression, you should never do it to multiple tracks/parts together at once. If your track has a lead and backing vocals, they should be recorded/exported individually as separate files! With that, as a reminder, the vocals as leads should come first in mixing - everything else should fit around them. That doesn’t mean you should have vocals before having anything prepared for the rest of the composition itself though, I’m only talking about the mixing part of things. Composition itself is a story for another day.
 
Last edited:

Nekoshey

New Fan
Oct 9, 2018
4
This is a wonderfully written explanation of the basics! Mixing (and the like) has always been a sore point for me personally -- no matter what I make, I'm always dissatisfied with the final product and feel like I could have done much better. And you're right, it's likely been because I jumped straight into just "doing it", without a proper foundation of why we do the things we do for audio mastering. But now I feel like I understand a lot more! It's like all the things I've been doing all these years suddenly "clicked" into place, and make much more sense now -- they have a purpose and a proper function, not just a methodology. Thank you so much for sharing your knowledge, I can't wait to put it to the test! (ノ≧∀≦)ノ ♡ ♡ ♡
 
  • Like
Reactions: GreenFantasy64

Users Who Are Viewing This Thread (Users: 0, Guests: 1)