

Discover more from Terms of Service with Chris Martin
One of the most dire consequences of our parasitic relationship with the social internet, in my view, is our willingness to freely give up treasure troves of data about ourselves. Just in the last week or so, it was revealed that 533 million Facebook users had personal data leak from the website. The data included phone numbers and email addresses. It ranks in the top five internet data leaks in history.
This information was not hacked out of Facebook, it was “scraped.” Someone hacking information is like breaking into your car, hot-wiring it, and stealing it. Someone scraping information is like stealing your car because the keys were in the ignition and the door was unlocked. Facebook was not hacked. Their system was used “correctly,” but exploited, and the data was leaked.
Someone, in essence, put every phone number in the world into a contact book and said “Facebook connect me to my contacts,” (one of Facebook’s oldest features) and then was able to leak the phone numbers, email addresses, and other information of 533 million people.
Facebook's initial response was simply that the data was previously reported on in 2019 and that the company patched the underlying vulnerability in August of that year. Old news. But a closer look at where, exactly, this data comes from produces a much murkier picture. In fact, the data, which first appeared on the criminal dark web in 2019, came from a breach that Facebook did not disclose in any significant detail at the time and only fully acknowledged Tuesday evening in a blog post attributed to product management director Mike Clark.
One source of the confusion was that Facebook has had any number of breaches and exposures from which this data could have originated. Was it the 540 million records—including Facebook IDs, comments, likes, and reaction data—exposed by a third party and disclosed by the security firm UpGuard in April 2019? Or was it the 419 million Facebook user records, including hundreds of millions of phone numbers, names, and Facebook IDs, scraped from the social network by bad actors before a 2018 Facebook policy change, that were exposed publicly and reported by TechCrunch in September 2019? Did it have something to do with the Cambridge Analytica third-party data sharing scandal of 2018? Or was this somehow related to the massive 2018 Facebook data breach that compromised access tokens and virtually all personal data from about 30 million users?
It’s sorta sad if you have to sift through multiple instances of hundreds of millions of people having their data leaked in the last three years to figure out which of the previous leaks this new information may be referencing.
If you have ever caught yourself saying, “Ahhhh. Who cares if [social media company] has my [personal information]. It’s all out there anyway,” save us your complaint when you start getting an exorbitant number of scam calls or emails in the next year or two from this leak. :-)
If you need to be convinced that you should care about your privacy on the internet, you can read a previous post of mine here. But today, I want to engage with the brilliance of Shoshana Zuboff and help you see how you possess the greatest resource on the face of the earth.
Get to Know “Surveillance Capitalism”
Shoshana Zuboff’s tome The Age of Surveillance Capitalism dives into the data-for-profit reality and the consequences of the attention economy. “Surveillance capitalism,” defined by Zuboff, “unilaterally claims human experience as free raw material for translation into behavioral data.” Who participates in surveillance capitalism? Most internet companies you could name participate in surveillance capitalism, but Facebook and Google are two of the most prominent players.
The key problem with surveillance capitalism, according to Zuboff, is that it is built upon manipulating people for the benefit of others. She writes, “Surveillance capitalists know everything about us, whereas their operations are designed to be unknowable to us. They accumulate vast domains of new knowledge from us, but not for us. They predict our futures for the sake of others’ gain, not ours.” We ought to be concerned with surveillance capitalism because it begins with internet users like you and me sharing information about our lives and it ends with large companies, or even governments, using that information we share for profit, surveillance, or worse, and we get nothing out of the deal except for the sweet satisfaction of personal expression and the occasional spookily-accurate ad for the moisturizing hand soap we were just talking about with our spouses.
Your “Data Exhaust” Is More Valuable Than Anything on Earth
At the core of surveillance capitalism is the most valuable resource available in the world today: “behavioral surplus.” What is behavioral surplus? Simply put, behavioral surplus is the data we intentionally and unintentionally deposit into the social internet in everything we do. It is often characterized as “data exhaust,” like the leftover information that expels itself as we engage online. Behavioral surplus is how you word a sentence in a Google search. It’s where your iPhone says you are at any given moment. It’s your smart thermostat recognizing the preferred temperature you prefer when you go to bed. It’s Instagram recognizing that you like pictures of dogs more than cats. Behavioral surplus is all of the little signals we give internet companies about who we are, where we live, what we care about, and what makes us tick—data that goes far beyond the status updates we actively type into these platforms.
In the early days of the social internet, Google recognized how much it could learn about its users from their searches. Google was picking up data from users not just by the actual words they typed, but all kinds of other data breadcrumbs that were dropped along the way. Google, and eventually other companies, soon figured out they could use this user information to make a massive profit. This data didn’t cost Google anything, but it was incredibly valuable to advertisers. It was the leftover data beyond the search, hence the term “data exhaust.” The exhaust dispelled from our earliest Google searches was captured and sold to advertisers who wanted to get their ads in front of relevant audiences.
This discovery, that data breadcrumbs from users could be harvested and sold, was pioneered by Google and led to much of the social internet surveillance activity we see today. But, in the early days, social internet companies were primarily concerned with harvesting data and selling access to it for advertisers to be able to place hyper targeted ads. Today, that method still exists, but the primary goal has shifted from harvesting as much behavioral surplus as possible to weaponizing that behavioral surplus for behavioral modification or manipulation.
Zuboff talked with a number of engineers and data scientists about what their apps, programs, and websites are designed to do as they collect data about user behavior. One chief data scientist at a well-respected Silicon Valley education company said (bolding mine):
The goal of everything we do is to change people’s actual behavior at scale. We want to figure out the construction of changing a person’s behavior, and then we want to change how lots of people are making their day-to-day decisions….We can test how actionable our cues are for them and how profitable certain behaviors are for us.
That data scientist was just one of a handful who made similar comments.
So, to summarize very briefly: most of the apps on your phone and social internet websites you use on a regular basis primarily exist as a means to gather and monetize your data. Even further, they are actively looking for ways to manipulate you into making decisions that make them more money. We, the users, are nothing more than rich data deposits from which these companies can extract the most valuable resource in the world: information about who we are and how we like to spend our money.
You may have heard the quip before, “If it’s free you’re the product.” It’s clever, and it’s sorta true, but it isn’t as accurate as it could be. In reality, we are not the product. It’s actually worse than that. The data points that numerically represent our lives are the product. We are just the wells that are tapped for that data. To these companies, our hopes, dreams, and personalities are all window dressing that get in the way of the actions we take online.
Ok—So What?
I’ve cited this many times, but I think it bears repeating. In his book Amusing Ourselves to Death, Neil Postman writes about Lewis Mumford, a 20th century American writer and thinker:
Lewis Mumford…has been one of our great noticers. He is not the sort of a man who looks at a clock merely to see what time it is. Not that he lacks interest in the content of clocks, which is of concern to everyone from moment to moment, but he is far more interested in how a clock creates the idea of “moment to moment.”
Let’s be great noticers like Lewis Mumford. We can become so consumed with out feeds that we never ask, “What does social media cost me, and should I change my relationship with it?”
Your “data exhaust” is incredibly valuable, and creating too much of it can become a nuisance. We would probably all be well-served by creating a little less data exhaust so that there wasn’t so much information for people to use maliciously if it were to fall into the wrong hands, as it so often does.