> BLOG / 04_THE_13_TRAITS.MD

2026-05-13 · by Zara · ~12 min read

What the 13 audio traits actually tell you about a song.

Every everysong readout starts with a card of 13 numbers describing the audio you uploaded. Most people glance at the BPM and key and skip the rest. That's a missed opportunity: the other 11 traits each answer a useful question about the song, from "is this mix actually loud enough?" to "is this track going to feel like dance music or chill electronic on the dance floor?". This is the no-jargon guide to what each trait measures, what counts as a typical value, and what to do with it.

Two ground rules before we start. First: eight of the thirteen traits are "GREEN" tier: they're measurements made by signal-processing algorithms with decades of academic research behind them. You can quote them in a mix note or a brief without hedging. Second: five traits are "AMBER" tier: they're outputs of pre-trained ML classifiers. Trust them as a useful second opinion, not as ground truth. The label is shown next to each reading in the readout so you don't have to remember which is which.

The 8 GREEN traits: signal-processing measurements

GREEN · TEMPORAL

01 · BPM (Tempo)

Beats per minute. The single most useful number for almost everything in music: matching cuts to the beat in video, beatmatching for DJs, picking a runner's playlist, finding tracks that'll feel right at the same energy level.

What's typical: 60-80 BPM = ballads, lo-fi, ambient. 80-120 BPM = pop, hip-hop, soul, midtempo electronica. 120-140 BPM = house, disco, mainstream EDM. 140-160 BPM = techno, drum & bass at half-time, fast rock. 160+ BPM = drum & bass, hardcore, speed metal.

Common gotcha: the engine can sometimes report half-time or double-time. A 75 BPM ballad with a syncopated kick can read as 150 BPM. Cross-check by listening: if the BPM "feels wrong," it's probably half/double.

Measured by: librosa beat tracker (autocorrelation over onset envelope) · Unit: BPM · Typical range: 50-200

GREEN · HARMONIC

02 · Musical Key

The tonic and mode of the song. Reported as a key name like "C major" or "F# minor". Useful for: harmonic mixing (DJs use the Camelot wheel), finding tracks that'll layer cleanly together, identifying songs that share a tonal centre with your reference.

What's typical: the most common keys in popular music are C major, G major, D major, A major, E minor, A minor, D minor: they're easier to sing and play on guitar/piano. Minor keys dominate sad/melancholic music; major dominates upbeat/celebratory.

Common gotcha: modulation. Songs that change key mid-track will report the dominant key only. Modal tracks (Dorian, Mixolydian) might be misclassified as their nearest major/minor relative.

Measured by: Krumhansl-Schmuckler key-profile algorithm · 24 possible values: 12 keys × {major, minor}

GREEN · LOUDNESS

03 · LUFS (Integrated Loudness)

Loudness Units relative to Full Scale, integrated over the whole track. This is the perceptual loudness: how loud the song actually sounds, not just how loud the peaks are. Streaming services (Spotify, Apple Music, YouTube) normalise tracks to a target LUFS, so this number tells you whether your mix will be turned up or down by the platform.

What's typical: Spotify normalises to -14 LUFS. Apple Music to -16 LUFS. YouTube to -14 LUFS. So if your track measures -14 LUFS, it'll play at full volume. -8 LUFS (a typical loud pop master) gets turned DOWN. -20 LUFS (a quiet acoustic ballad) might get turned UP, but most platforms cap the upward boost.

Use it for: mix decisions. If your reference track is -8 LUFS and yours is -16, you're 8 dB quieter: which on streaming services means you'll sound thinner side-by-side even though the platforms normalise both.

Measured by: ITU-R BS.1770 integrated loudness · Unit: LUFS · Typical range: -23 (broadcast) to -6 (very loud pop)

GREEN · TIMBRAL

04 · Spectral Centroid

The "centre of mass" of the audio's frequency spectrum. A high spectral centroid means the average energy is up in the high frequencies: brittle, bright, "tinny" sounds. A low centroid means the energy concentrates in the lows: dark, warm, "fat" sounds. Loosely, this is the engineer's term for what most listeners call "brightness."

What's typical: heavy metal mixes often around 2,000-3,500 Hz. Acoustic singer-songwriter around 1,500-2,500 Hz. Lo-fi hip-hop with low-pass filtering around 800-1,500 Hz. Crystal-clear EDM masters around 3,000-5,000 Hz.

Use it for: A/B-ing your mix against a reference. If yours measures 1,200 Hz and your reference is 2,800 Hz, your high end is missing: you've over-darkened the mix.

Measured by: weighted mean of frequency bins · Unit: Hz · Typical range: 500-5,000

GREEN · TIMBRAL

05 · Spectral Rolloff

The frequency below which 85% of the audio's spectral energy lives. Complements spectral centroid: rolloff captures where the "ceiling" of energy is, centroid captures where the average sits. Useful for detecting how much air and shimmer a track has up top.

What's typical: 2,000-4,000 Hz on lo-fi tracks. 5,000-7,000 Hz on standard pop masters. 9,000-14,000 Hz on EDM and modern bright mixes.

Use it for: identifying tracks that share a frequency profile. Two songs with similar spectral centroid AND rolloff are almost certainly going to feel similar in the high end.

Measured by: cumulative spectral energy threshold (85%) · Unit: Hz · Typical range: 1,000-15,000

GREEN · SPATIAL

06 · Stereo Width

How spread-out the stereo image is. Computed from the correlation between the left and right channels. A correlation of 1.0 = mono (both channels identical). A correlation near 0 = full stereo spread (the channels are mostly independent). Negative correlation = phase-inverted stereo (a deliberately wide trick, but can collapse to mono playback weirdly).

What's typical: well-produced pop and electronic tracks: 0.3-0.7 (reasonably wide). Mono recordings (a lot of older jazz, podcast voiceover): close to 1.0. Hyperwide EDM masters: 0.1-0.3. Anything below 0 is unusual and might indicate a phase problem.

Use it for: spotting whether a reference is using significant stereo spread (and whether you should). Some genres (singer-songwriter, lo-fi) intentionally stay tighter; others (EDM, modern rock) push wide.

Measured by: 1 minus normalised L-R correlation · Unit: dimensionless · Typical range: 0.1 (wide) to 1.0 (mono)

GREEN · TIMBRAL

07 · Zero-Crossing Rate

How often the audio waveform crosses zero per second. High zero-crossing rate = lots of high-frequency content, transients, or noisy textures. Low zero-crossing rate = smooth, low-frequency content. Loosely, this is "noisiness" or "percussiveness."

What's typical: orchestral / cello / sustained instruments: 500-1,500 crossings/sec. Pop with cymbals and hi-hats: 1,500-3,500. Distorted guitar or aggressive synth: 3,500-6,000. White noise / cymbal washes: 8,000+.

Use it for: identifying tracks with similar percussive character. A track with a lot of cymbals will have a high zero-crossing rate; an ambient pad piece will have a low one.

Measured by: count of sign changes in audio waveform · Unit: crossings/sec · Typical range: 500-10,000

GREEN · CLASSIFICATION

08 · Vocal vs. Instrumental

Does the track have a sung vocal lead or not? A binary classification: vocal or instrumental: based on detecting vocal formant patterns in the audio.

What's typical: most pop, rock, hip-hop, R&B will read "vocal." Instrumental electronic, ambient, soundtrack, classical instrumental, jazz instrumental will read "instrumental."

Use it for: filtering matches. If you need an instrumental bed under a podcast voiceover or video narration, set the filter to exclude vocal tracks. Most CC libraries include both, and there's no faster way to narrow than a vocal/instrumental switch.

Measured by: vocal-formant detection · Unit: boolean · "vocal" or "instrumental"

See these traits for your own song

$5 once · lifetime access · readout in 15 seconds

▶ Pay $5 · Unlock Everything

The 5 AMBER traits: ML-classifier outputs

These are the "mood-ish" descriptors. The numbers are useful directional indicators but should be treated as second opinions, not measured facts. They come from pre-trained machine-learning models that have been trained on millions of labelled tracks: they're statistical guesses, not measurements.

AMBER · MID-LEVEL

09 · Energy

How "intense" or "active" the song feels. High-energy = loud, fast, distorted, percussive, busy. Low-energy = quiet, slow, sparse, melodic. This is a holistic perceptual descriptor: not the same as loudness (which is purely a measurement) or BPM (which is purely a rhythm measurement).

What's typical: hardcore punk and EDM bangers: 0.85-1.0. Standard pop and rock: 0.5-0.8. Singer-songwriter and downtempo: 0.2-0.5. Ambient and meditation tracks: 0.0-0.2.

Use it for: mood matching. Two tracks at 120 BPM in C major can feel completely different: one might be a quiet acoustic ballad, the other an aggressive house track. Energy disambiguates them.

Measured by: ML classifier trained on labelled examples · Unit: 0.0-1.0 scale · Typical: most songs land in 0.3-0.8

AMBER · MID-LEVEL

10 · Danceability

How well the song fits a steady, danceable rhythm. High danceability = stable tempo, clear pulse, locked-in groove. Low danceability = irregular meter, lots of rubato (tempo flex), or just no clear beat to move to.

What's typical: disco, house, funk, contemporary R&B: 0.7-0.95. Pop and hip-hop with steady beats: 0.5-0.8. Rock with live drums and tempo variation: 0.3-0.6. Free-jazz, classical, ambient: 0.0-0.3.

Use it for: playlist building. If you're scoring a workout video, filter for danceability > 0.7. If you're building a chill-cafe playlist, filter for 0.3-0.5.

Measured by: ML classifier · Unit: 0.0-1.0 scale

AMBER · MID-LEVEL

11 · Valence (Happy/Sad)

How positive or sad the song sounds emotionally. High valence = happy, cheerful, euphoric. Low valence = sad, dark, melancholic, tense. This is the most subjective of the AMBER traits: what one listener calls "uplifting" another might call "bittersweet": so treat it as approximate.

What's typical: sad ballads, blues, certain post-rock: 0.0-0.3. Mid-emotion indie and most film score: 0.3-0.6. Pop, funk, dance, gospel: 0.6-0.9. Hyperactive pop and children's music: 0.9-1.0.

Use it for: mood-matching CC alternatives to a copyrighted reference. If your reference is a sad ballad (valence 0.2), filter the matches by valence < 0.4 to keep the emotional register consistent.

Measured by: ML classifier · Unit: 0.0-1.0 scale · Note: cross-cultural valence labels are noisy in training data, so this is the noisiest AMBER trait

AMBER · TIMBRAL

12 · Acousticness

How "acoustic" (real instruments) vs. "electronic" (synths, drum machines, samples) the song sounds. Pure acoustic recording with no electronics = 1.0. Pure synthetic with no acoustic content = 0.0. Most modern music sits in the middle.

What's typical: solo piano or string quartet: 0.85-1.0. Singer-songwriter with acoustic guitar: 0.6-0.9. Pop with both acoustic and electronic elements: 0.2-0.6. EDM, synthwave, electronic dance: 0.0-0.2.

Use it for: finding CC matches that share the production aesthetic of your reference. An acoustic singer-songwriter reference shouldn't get matched against synthwave, and acousticness is the trait that catches that.

Measured by: CLAP zero-shot classifier ("acoustic" vs "electronic" prompt) · Unit: 0.0-1.0 scale

AMBER · CLASSIFICATION

13 · Instrumentalness

Different from the GREEN vocal/instrumental detector: instrumentalness is a continuous score reflecting how likely the track is to have NO vocal content at all, including spoken word and adlibs. The GREEN detector is more binary (sung vocal or not).

What's typical: pure orchestral, ambient, instrumental electronic: 0.9-1.0. Mostly instrumental with rare vocal samples or "ooh" backing: 0.5-0.8. Pop with verses: 0.0-0.3. Spoken-word podcast: 0.0-0.1.

Use it for: filtering by use case. Instrumental beds for podcast voiceover should target instrumentalness > 0.7. Vocal-led pop should target < 0.3.

Measured by: CLAP zero-shot classifier ("instrumental" vs "vocal/sung" prompt) · Unit: 0.0-1.0 scale

How to actually read a readout

When you upload a song to everysong, you'll see all 13 of these numbers across the top of the card. Here's a quick "is this reading sane?" check:

Glance at the BPM first. If it looks wildly off (e.g. it says 200 BPM but you can clearly tap a slow 100 BPM beat), the engine probably did half/double misdetection. The matches will still be useful but biased toward the misdetected tempo.
Check the key. If you have absolute pitch or know the song's key, verify. Most pop song keys are easily checkable against the chord patterns on Ultimate Guitar or similar.
Look at LUFS. If you're producing a track yourself and trying to match the reference's loudness target, this is the number you'll consult most often.
Use spectral centroid + rolloff together. Two tracks with similar values in both are almost certainly going to feel similar in their high-frequency character.
Treat AMBER traits as a tie-breaker. When you have two candidate matches with similar GREEN profiles, the AMBER traits (energy, valence, danceability) help you pick the one with the closer mood.

Worked example: matching an acoustic indie folk track

Pretend you uploaded a quiet acoustic indie folk song: solo voice, fingerpicked guitar, no drums. A reasonable readout might look like:

BPM:              82
Key:              D minor
LUFS:             -18.5
Spectral centroid: 1,820 Hz
Spectral rolloff:  4,400 Hz
Stereo width:     0.55
Zero-crossing:    1,100 / sec
Vocal/inst:       vocal
Energy:           0.32
Danceability:     0.41
Valence:          0.28
Acousticness:     0.87
Instrumentalness: 0.05

Reading this: slow tempo, minor key (so probably melancholic), moderately quiet (-18.5 LUFS is below platform target: will get normalised UP slightly), darker spectral profile (centroid below 2 kHz), reasonably stereo, low percussive content, sung vocal, low energy, low danceability, sad-leaning emotion, strongly acoustic, definitely vocal-led.

The matches you'd see in this case would skew toward: slow tempo acoustic singer-songwriter tracks in minor keys. Other indie folk, some ballads, possibly some acoustic blues or chamber-pop. The engine would NOT return upbeat dance tracks because the energy, danceability, and acousticness all push hard against that.

Worked example: matching a deep house track

Pretend instead you uploaded a 4/4 deep house track:

BPM:              124
Key:              G minor
LUFS:             -8.2
Spectral centroid: 2,950 Hz
Spectral rolloff:  9,200 Hz
Stereo width:     0.32
Zero-crossing:    2,400 / sec
Vocal/inst:       instrumental
Energy:           0.78
Danceability:     0.88
Valence:          0.55
Acousticness:     0.05
Instrumentalness: 0.92

Reading this: standard house BPM, minor key (deep house is mostly minor), loud master (well above platform target), bright spectral profile, wide stereo, percussive (kick + hat content), instrumental, high energy, very danceable, emotionally neutral-to-positive, fully synthetic.

The matches would skew toward: house, techno, deep electronic, possibly some downtempo electronic for the wider net. Wouldn't return ambient (too low energy mismatch) or acoustic folk (acousticness mismatch). The danceability and instrumentalness traits both push in the same direction here.

Why these 13 specifically

v1 deliberately ships only these 13 because each one is well-validated by years of academic music information retrieval research (the GREEN tier) or trained on enough labelled data to be reliably useful (the AMBER tier). The landing page mentions a wider "atlas" of ~190 experimental traits: those are aspirational, marketing-honesty about what the engine could theoretically compute, not features that v1 actually ships. We deliberately don't show numbers we haven't validated.

One trait we considered and dropped: genre classification. Pre-trained genre classifiers leak Free Music Archive's training distribution heavily, so a track that's clearly "deep house" to a human ear can get classified as "electronic > experimental" because that's how FMA's labelling skews. Until we can run the model against a cleaner, broader genre dataset, the genre column stays out of v1.

What to do with all this

Three practical uses for the 13-trait readout:

As a mix-reference comparison. Upload your work-in-progress next to a reference you're chasing. The deltas tell you where you've drifted: LUFS, spectral balance, stereo width.
As a match-quality sanity check. Before trusting the 20 CC matches the engine returns, verify the 13 traits "look right" for your reference. If the BPM is half-time misdetected, the matches will be off.
As a song-tagging system for your own library. If you keep a personal catalogue of demos, run each one through everysong and the 13-trait readout becomes a quantitative tag set. Filter by "all my demos with valence < 0.4 and danceability > 0.7" and you've got your sad-club-bangers shortlist.

Try it on a song you know well

$5 once · lifetime access · 30-day refund, no questions

▶ Pay $5 · Unlock Everything

What the 13 audio traits actually tell you about a song.

The 8 GREEN traits: signal-processing measurements

01 · BPM (Tempo)

02 · Musical Key

03 · LUFS (Integrated Loudness)

04 · Spectral Centroid

05 · Spectral Rolloff

06 · Stereo Width

07 · Zero-Crossing Rate

08 · Vocal vs. Instrumental

The 5 AMBER traits: ML-classifier outputs

09 · Energy

10 · Danceability

11 · Valence (Happy/Sad)

12 · Acousticness

13 · Instrumentalness

How to actually read a readout

Worked example: matching an acoustic indie folk track

Worked example: matching a deep house track

Why these 13 specifically

What to do with all this

Related reading