Or, why do algorithmic recommendations leave so much to be desired.
You can find the code and data for this article at this link. It's all hosted on Deepnote, a new kind of data notebook designed for collaboration.
You have your headphones on with your favorite Spotify playlist on shuffle. You like this music, but somehow it just isn't doing it for you today. You hit the next button a few times in vain, searching for a feeling.
Determined, you start scrolling through your algorithmically created playlists. The ones Spotify makes just for you. But they don't feel like they are for you, but instead for someone who looks like you, the stereotypical you.
If this sounds familiar, you aren't alone. I experience this regularly, and when I started to tell others, they told me their similar stories. So like any data-oriented person, I decided to do entirely too much work to learn a few things. In the end, I did come up with the insight I was looking for.
It wasn't a straight line to the insight, though.
Side quest: How has music changed over the years?
For me, music takes me to a time and a place. Through the lens of that time and place, it gives me a feeling, energy, or maybe a mood. For me, that time was from 2002 to around 2009, during my high school and most of my undergraduate years.
In search of data to explore more, I found this Wikipedia collection of the Billboard Hot 100 Top-Ten singles for every year from 1958 to 2022. To gather this data, I scraped each page and collected it all into a dataset of 4,251 songs. Here is a sample:
This data isn't entirely useful by itself. Since we are dealing with Spotify, I thought maybe the Spotify Developer API could give me something to work with. Luckily, Spotify has an API endpoint that lets you get "features" from a track. These include things like acousticness, duration, danceability, etc.
Here is a sample:
After pulling down these features, I did some exploration. Given that I had tracks from so many years, I thought it would be interesting to see how music had changed over time. I made averages for each track feature by year and plotted them. Several things caught my attention.
First, from the mid-1970s until the early 2000's, the average track duration was over 4 minutes. Maybe attention spans are decreasing, but I certainly remember music from my youth often being too long. Perhaps notable, the first iPod was introduced in October 2001.
Next, I found that loudness increased dramatically from the mid-90s and has only recently started trending down. After researching a bit, I found that there was such a thing as the loudness war. Apparently, demand for our ears is just as fierce as for our eyeballs.
This became such a problem that Spotify (and other music services) started normalizing the audio of tracks.
Spotify has other features they track, like "speechiness," which is basically a measure of how much like pure speech a track sounds like. Values above .33 contain both music and speech.
Tracks aren't reaching the point of being just speech as they are well below the .33 level. There has been a notable increase in the average speechiness of tracks. There was a similar bump in the 2000s, so this could be a trend.
At the same time, instrumentalness, the prediction of whether a track has no vocals, has gone to near 0. This tells me that more instrumental songs used to have a chance at hitting the top 10, whereas now they have much less chance.
The blip in 2011, where the two lines crossed, seems to be an obvious signal. However, I know so little about music trends I can't interpret this.
Some features obviously trended downward. Valence, the musical positiveness of a track, has been steadily decreasing. At the same time, acousticness (i.e., music produced with a physical instrument that vibrates the air) has been decreasing.
I associate acoustic music with more positive feelings. I'm not sure if this is generally true. A welcome change is that acousticness jumped in 2019.
Last, both energy and danceability (how danceable a track is) have been trending up, although not as dramatically as we have seen in other features. Top tracks today are more than 10% danceable since 1958.
Energy has been closely correlated with danceability. Something in me wants to investigate where the lines cross (the mid-2010s, early 2000s, and late 90s) to see if there is a signal there of something. I'll leave that for another day.
Unfortunately, all of this exploration didn't get me closer to understanding how to find better music for myself. I did get some important perspective, though.
Building with data, one project at a time.
Get notified when a new project drops.Subscribe
The Solution: Curation
Through my exploration of this music, I realized that my core idea remained. Music takes you to a time and a place. Spotify knows what I listen to, my playlists, and what music I hit the heart button for. They know shockingly little else about me.
If I want music related to a time and a place in my life, I would need to know about my life. Things like my age, where I went to high school, and my university. I've never told Spotify these things (although I did connect my Facebook profile for "social listening").
Taking this a step further, I have a theory that most people's taste in music is formed in their late high school/early college years. These years had a significant mark on my life. They were the beginnings of adulthood. I now notice the music that I didn't like at the time but was around me a lot (*gasp* country music), I'm starting to listen to more.
So how could Spotify, or other music services, give a better recommendation experience? Through curation and discoverability.
What is curation? Simply put, collecting good work and adding context to it. That context, for me, is the time and place where I discovered that music. Your context may be different.
Playlists are a form of curation. It's clear that Spotify puts a lot of work into auto-generating playlists just for you, your "top mixes." But these are collections without context. They are simply regurgitations of music I've listened to recently sorted by musician, genre, or time. These playlists never help me find music I love but haven't heard in years.
I've found that the best playlists on Spotify are user generated. My current favorite seems to be made by someone with very similar tastes to mine. It's taken me weeks of futile searching and trying algorithmic playlists to get to this one. The problem is clearly discoverability.
This problem isn't limited to music either. A recent market survey showed that 84% of online shoppers say personalization influences their purchases. The same survey found that 70% of shoppers say it takes multiple tries to find what they want.
Amazon's recommendations are terrible. Apple Music has laughable recommendations. Spotify is on the better end of the spectrum, but there is much to be desired. There seems to be an obvious market opportunity here for curation and discoverability for nearly anything.
I wanted to use the data I had to make some small contribution to curation and discoverability. So I put together a small app that allows you to select a range of years, and it will link you to a Spotify playlist of the songs that made the Billboard Hot 100 top-ten singles that year.
Are these guaranteed to be amazing? Most definitely not. I was surprised at just how many songs make the top 10. Each year can have 40-100 songs, and that is probably more than you actually like from that year.
Nevertheless, I present it to you in all its simplicity. To view the app on a separate page, click here.
With every data story, you can access all the code I used to perform the analysis and create the Streamlit app above. I host it all on Deepnote, a great place to perform analysis and share it with others.
Click the button below to find the code on Deepnote.