Are you being too picky choosing a partner? A data driven approach.

Are you being too picky choosing a partner? A data driven approach.
Photo by Nong V / Unsplash
TLDR: Using Census data, I built an to see how many people match your demographic criteria in a metro area. Scroll down here to find the app. In the post, I go through my inspiration for this app and some other considerations.
You can find the code and data for this article at this link. It's all hosted on Deepnote, a new kind of data notebook designed for collaboration. Thank you to Deepnote for sponsoring this week's article.

If you are single and looking for a partner, a lot of doubt can creep in. When I was single, I calmed my mind by thinking about the problem from a data perspective. Certainly, finding your ideal partner is a matter of having a large enough population + a bit of luck running into them. Right?

After all, the number of people interested in you is some subset of the population. The number of people you are interested in is another subset of the population. Where those two sets meet is what you want.

Venn diagram with the overlapping section highlighted.

Using publicly available data, we can figure out at least one of these subsets of people, the population that meets your basic criteria. By choosing some desired demographic features in your potential partner, we can determine the total population matching those criteria in a specific area. More on this in a second.

This whole article was inspired by a TED talk by Amy Webb in 2013. Amy describes her struggle with online dating and how she "hacked" the system using data. She does some napkin math to figure out how many people in Philadephia, where she lived, match her requirements.

List of math showing the population of Philadelphia and narrowing it down by demographics. Out of 1.5 million people only 35 people met her criteria.
Screengrab from the TED Talk.

She had some very specific requirements and ended up with a potential population of 35. My biggest takeaway from the TED talk is that some factors will narrow your choices more than others.

Amy used some rough numbers to make her calculations. I decided that it would be better to use some hard data from an excellent source of demographic data in the United States, the Census Bureau. Using data from the American Community Survey 1-year data estimates from 2021, I pulled population numbers for the top 100 metro areas in the US by population.

Specifically, I used population numbers for location (metro area), race, gender, age, and education level. If you are interested in the technical details of how I did this, check out the code here. With all the data collected, I put together an app that allows you to find at least one of the pieces of your Venn diagram. The data looks like this.

Table of data showing populations for metro areas including gender, age, and race.

Notably, the Census Bureau keeps track of subpopulations for gender, age, and race down to the county level. I chose to use metropolitan areas instead, as Americans usually think about where they live in those terms (at least I do).

Location location location

One of the biggest takeaways from this data is that location matters. For example, in Akron, Ohio, the number of white (or white, not Hispanic or Latino) females aged 30-34 with a graduate degree is estimated to be 4,648.

  • In the Dallas-Forth Worth metro area, the same criteria match an estimated 37,100 people. A nearly 700% increase.
  • In the New York metro area, the same criteria shoot up to an estimated 147,693 people! That's more than a 3,000% increase!

While a population of 4,648 sounds like a good number, your odds of demographically matching are significantly higher in bigger cities. Whether those larger numbers yield better quality matches and, therefore, better relationships is another question.

Here is what the funnel from Akron looks like.

Match funnel for white or white, not Hispanic for Akron, Ohio. A final population of 4,648 remains.

Serendipity matters, right?

Back when Blockbuster existed (one is still left, apparently) our view of the movie world was limited to what would fit on the shelves. It wasn't until Netflix DVD by mail that we had access to a long tail of interests.

Dating isn't much different. Relying on meeting your future partner in person through serendipity limits your choices to a small percentage of your potential matches. Probabilistically, online dating has supercharged serendipity so that you have many, many more chances of "running into" someone where your Venn diagrams match up.

Of course quality of the serendipity these apps provide varies. While some dating apps focus on surface-level features (ahem, Tinder et al.), others do attempt to find you a meaningful connection. So serendipity still matters because you need to be able to meet in the same space virtually or in person.

Understanding the world through data

Get notified when a new post drops


Demographics are just one step

Through the Census API we are able to get estimates of populations by 5 different demographic variables. Of course, demographics can only get you so far. Most of us (I hope) have criteria in a partner far beyond the surface level. We care about a sense of humor, alignment of worldview, and other much harder-to-quantify variables.

My own experience has told me to keep my funnel as open as possible. Nearly 10 years ago I met my wife, and if I had my demographic funnel too narrow, I would have never met her. I hope that this app, besides being a bit of fun, can help you understand how your own criteria in a partner can open up or limit your choices.

So the rest of the work of finding a partner is left up to you (or outsourced to your dating app).

The App

To let you play with the data yourself, I put together a simple web app. You select your desired location, race(s), gender, age range(s), and education level(s). The app then tells you an estimated number of people matching that criteria where you live.

You are given a detailed funnel showing how each of your criteria excludes more and more of the population. To view the app on a separate page, click here, otherwise you can see it embedded here.

The Code

You can access all the code I used to perform the above analysis. I host it all on Deepnote, a great place to perform analysis and share it with others.

Click the button below to find the code and data on Deepnote.

Thank you to Deepnote for sponsoring this week's post. I host all the code and data for datafantic on Deepnote, and I believe it's an excellent tool for data professionals to collaborate and share their work.

Deepnote is a new kind of data notebook that’s built for collaboration— Jupyter compatible, works magically in the cloud, and sharing is easy as sending a link.