Full-Scale War

How loud should masters be for Spotify?

There is a lot of speculation about how loud masters should be for Spotify. Many mastering engineers still provide a one-size-fits-all master that usually guarantees Spotify’s loudness normalisation will turn it down. Is this a bad thing to do?

Here is an evidence-based exploration of how Spotify plays tracks with different loudness levels. In a nutshell, I made nine tracks with different loudness readings*, put them on Spotify, and then ran them through a variety of tests to see how loud they were streamed.

TL;DR: Tracks that sit around -14LUFS-I end up with more dynamic range than tracks with a higher LUFS-I reading, when Spotify is normalising playback. However, this doesn’t mean we should ‘aim’ for -14 LUFS - we should aim for the appropriate loudness and dynamic range for each individual project.

*Generally, I try to avoid the terms “target” or “score” when referring to LUFS, in favour of the term “reading”. "When a measure becomes a target, it ceases to be a good measure".

Three waveforms depicting tracks at different loudness levels

Overview

I followed these steps to perform a basic test:

  1. Produce three tracks of different genres

  2. Create three masters for each track (so nine tracks in total), with each master achieving specific loudness readings

  3. Upload these nine tracks to Spotify

  4. Record Spotify directly into Reaper, playing these tracks alongside other music in a variety of settings

  5. Measure the loudness and dynamic content

The original three tracks I made were:

  • “Wildlife” - a tech house track

  • “Meander” - an alt/pop song

  • “Nowhere” - an instrumental hip-hop beat


There were three masters of each track, at three different loudness levels. Here’s what they were and how I will refer to them from here on:

  • “STR”: -14 LUFS-I, -1dBTP

  • “DIGI”: -9 LUFS-I, 0dBTP

  • “SHY”: -16 LUFS-I, ranging from -1 to -4dBTP


You can hear these tracks here and test them for yourself.

(All audio jargon is defined at the bottom of the page, for those who aren’t familiar with terms like LUFS or dBTP.)

Aesthetically pleasing spectrum analysis of a waveform

These intermittent waveforms have no relevance, they are just striking visual dividers

First test: Basic playlists

I made three playlists, stylistically matched with each of my three tracks. I then added all three versions of my track to their respective playlists, and streamed the playlist whilst recording the output of Spotify. The results are tabled below, but in summary:

  • The “STR” tracks were mostly equal to or marginally higher than the ‘DIGI” tracks for integrated loudness

  • The “STR” tracks all had a slightly higher short-term LUFS peak than their ‘DIGI” counterparts

  • The “STR” tracks all had a greater loudness range than their ‘DIGI” counterparts (more dynamic range)

Table delineating different loudness metrics of streamed audio

So in these results, the -14 LUFS -1dBTP (STR) tracks all sounded the same as or marginally louder than their DIGI counterparts: but had quite a significant advantage in dynamic range.

After doing this test I made a few changes:

  1. Having all three versions of each track in each playlist was obviously going to impact the overall LUFS-I: they needed to be tested in duplicate playlists that only had one version of each track.

  2. The “SHY” didn’t really need to be tested any further; it was always going to be the quietest in LUFS and decibels. Plus, it was only ever -14 LUFS vs. -9 LUFS that most people were ever interested in, right?

Spectrum analysis of a waveform in striking colours

Second test: Shuffling around

I wanted to try out the common scenario of a free-flowing listening session; where someone adds songs to a queue and skips from genre to genre. This should make Spotify’s loudness control behave differently because it can’t see into the future and normalise an entire playlist. 

To do this, I played six songs, then played the “STR” version of “Wildlife”, and then quit Spotify (to purge the listening session). I then repeated this for the rest of the tracks.

The results were predictable: Spotify plays each track to a pre-normalised -14 LUFS-1. Each “DIGI” track has been turned down almost exactly -4dB, to squeeze the -9 LUFS-I into -14 LUFS-I.

Summary of results:

  • All tracks now hit exactly -14 LUFS-I.

  • The short-term LUFS peak was marginally higher for each ‘STR’ version compared to its DIGI counterpart

  • Moderate to significant differences in loudness range: all “STR’ versions had greater LRA

More loudness data of mastered tracks from streaming services

Seeing the waveforms side by side paints a clear picture of the dynamic difference. Here I’ve got the three tracks grouped together, with the STR versions on the left of each pair.

Six waveforms visually juxtaposing the different dynamic range for different audio masters

These waveforms do have relevance, that’s why I made them uglier

Third test: “Competitive” loudness 

In the third experiment, I wanted to see how the tracks compared to other music in terms of loudness and dynamics. 

The idea of ‘competitive loudness’ is crucial in situations where your track has been added to a Spotify playlist and is played concurrently with other music of the genre: does yours sound quieter, or louder?

So I made three new playlists drawing on the stylistically relevant Spotify playlists; ie “Tech House Operator” for Wildlife, “Front Left” for Meander, and “chill lofi study beats” for Nowhere. I then duplicated each playlist and added the STR and DIGI versions to separate playlists.

The object was to compare the tracks to their real-world counterparts, rather than one another. Here are the results:

  • Both versions of “Wildlife were louder (LUFS-I) and more dynamic (LRA) than the average in the tech house playlist. STR was more dynamic and slightly louder than DIGI.

  • Both versions of ‘Meander’ were louder and more dynamic than the average in the alt/pop playlist. ‘STR’ was more dynamic and slightly louder than ‘DIGI’.

  • Both versions of ‘Nowhere’ were marginally louder (LUFS-I) than the average in the chill beats playlist. ‘STR’ had a greater loudness range and higher dBTP than ‘DIGI’.

Loudness data from Spotify: test results that show differences in loudness and dynamic range for different masters
More loudness data from Spotify: test results that show differences in loudness and dynamic range for different masters

Conclusion

We shouldn’t worry too much about LUFS-I when mastering, and instead let the music dictate how loud and dynamic it should be.

It must be said that overall, in LUFS specifically, the ‘STR’ tracks were only ever marginally louder, and in fact usually the same integrated loudness. You probably don’t need to lose too much sleep about the integrated loudness of your music on Spotify: most music will end up getting played back at around -14 LUFS-I when normalisation is turned on

As expected, the distinct difference in these tests was in the dynamic range (LRA). With this in mind, consider that dynamics can really increase the impactfulness of your music, and thus make it stand out and be more emotional.

We shouldn’t worry too much about LUFS-I when mastering, and instead let the music dictate both how loud and how dynamic it should be. Loudness and dynamics vary dramatically from project to project: trying to always “hit” a LUFS target; whether that be -14, -9 or anything else, is missing the forest for the trees when ears and taste should always be at the forefront.

Please feel free to test these tracks further and let me know if there are any interesting results. If you would like the original files I uploaded for these tests, get in contact and I’ll pass them along.

Mono waveform of a mastered audio file, in black and blue colouring



A few postscripts

  1. This article doesn’t explore any streaming services other than Spotify, although the tracks are available on most platforms and I might test them at a later date.

  2. If you want to quickly find out how much your masters will be turned down by DSPs you can use this great free tool called Loudness Penalty. I tested these tracks out there, and it was 100% in line with my own results, so I believe it works well.

  3. Here are the Spotify settings I was using during testing:

Spotify listening settings for the testing conducted

Definitions

LUFS: Abbreviation of “Loudness Units relative to Full Scale”. The modern standard of measuring loudness that considers both the way humans perceive sound and a traditional decibel output weighting.

LUFS-I: Also known as “Integrated Loudness” or LUFS Long Term, it is short for “Loudness Units relative to Full Scale - Integrated”. A LUFS reading that considers an entire track, album or any long-term passage of audio. 

LUFS-S: “Short” term LUFS - a 3-second measurement. A peak LUFS-S reading gives some indication of how loud the loudest point of a track is.

dBTP: Decibels True Peak - a more specific, more precise way of measuring the highest decibel reading that takes into account Intersample Peaks.

LRA: Loudness Range - a measurement of the dynamic range of audio, in which a higher number signifies more distance between the quietest and loudest portions.

Listening session: Spotify’s calculation of when someone is listening concurrently -  relevant here as it factors into loudness normalisation. 

Next
Next

How to deliver stems for mixing