Most of us care about privacy on the Internet—we don’t want the personal information we place online to be spread without our knowledge or consent. But with clever algorithms or even just a bit of statistics, intimate secrets that we don’t express at all can be surmised by the preferences we do express. A paper
published online today in the Proceedings of the National Academy of Sciences (PNAS)
shows just how revealing Facebook
Likes can be.
Previous research has shown that basic demographic information (such as age, gender, and education level) and elements of personality can be guessed by looking at online profiles and browser histories. And Charles Duhigg reported in the New York Times Magazine last year that Target can figure out when its customers are pregnant based on subtle changes in shopping habits (unscented lotion and dietary supplements are clues), and the company outed one teenage girl to her father by mailing her crib coupons. But this new study, by Michal Kosinski and David Stillwell of the University of Cambridge and Thore Graepel of Microsoft Research, goes much further in demonstrating the potential of user profiling, documenting clues for intelligence, sexual orientation, and drug use.
The study was pretty simple: 58,000 volunteers shared their Facebook profiles and filled out additional surveys. The researchers looked for correlations between Likes and other attributes. That’s it. Below is the accuracy with which they could predict certain dichotomous attributes based on Likes. (Accuracy here is defined as the probability of correctly categorizing two randomly selected people, one from each of the two paired categories—e.g., male and female.)
- Single: 67%
- Parents Together at 21: 80%
- Smokes Cigarettes: 73%
- Drinks Alcohol: 70%
- Uses Drugs: 65%
- White vs. Black: 95% (There isn’t much overlap between what whites and blacks Like.)
- Christian vs. Muslim: 82%
- Democrat vs. Republican: 85%
- Gay vs. Straight Man: 88%
- Lesbian vs. Straight Woman: 75%
- Gender: 93%
The researchers could also make decent guesses at non-dichotomous attributes. Below are the correlation strengths between their predictions and the actual values. A correlation coefficient above 0.3 is considered moderate, and one above 0.5 is strong.
- Satisfaction With Life: 0.17
- Intelligence: 0.39
- Emotional Stability: 0.3
- Agreeableness: 0.3
- Extraversion: 0.4
- Conscientiousness: 0.29
- Openness: 0.43
- Density of Friendship Network: 0.52
- Number of Facebook Friends: 0.47
- Age: 0.75
The test-retest reliability for openness (the correlation between the two scores one would get if one took the test twice) is 0.50, pretty close to 0.43, so for that trait, looking at someone’s Likes is almost as good as looking at their actual personality test score.
The volunteers had a median number of 68 Likes (ranging from 1 to 700), but the researchers found that knowing just one random Like from a person usually gave them a better-than-chance guess at the person’s gender, age, and openness. And the connections aren’t always obvious. For instance, the researchers point out, less than 5% of the gay volunteers Liked explicitly gay groups, but Liking Desperate Housewives was kind of a giveaway. (Okay, that one is maybe a bit obvious.)
Stillwell wrote me in an email:
My favourite part is that whereas previous researchers have linked behaviour online with personal traits, Facebook Likes have a meaning that we can use to understand the psychology behind what people do. For example, in the supplemental table for "parents separated at 21" a couple of the most predictive Likes of parental separation are for the Likes: "I'm sorry I love you" and "If I'm with you then I'm with you I don't want anybody else". So although our prediction is not very good (60%, which is just above chance at 50%), it gives us a poignant insight into the effects that parental breakup has on children even after they grow up. It was surprising to us that parental breakup has any effect at all on the things you choose to Like.
Certainly more attributes than those studied here could be detected using Facebook Likes, or using other online behavior. That could be great: targeted content, no more irrelevant ads. But the potential downside is also significant. The authors write: “Commercial companies, governmental institutions, or even one’s Facebook friends could use software to infer attributes such as intelligence, sexual orientation, or political views that an individual may not have intended to share. One can imagine situations in which such predictions, even if incorrect, could pose a threat to an individual’s well-being, freedom, or even life.”
I asked if the findings would change what the authors Like on Facebook. Stillwell replied:
I think it’s like deciding what to wear in the morning: We all know that choosing clothes is important because it gives others a certain impression about us. It’s the same with Likes. As long as you’re happy being associated with a certain Like, and you’re aware of who you’re sharing that with and what they might infer from that, then it’s an opportunity to control your image. We’re expecting a sudden influx of people liking "Curly Fries"!
Why curly fries? In supplementary online data, the researchers list the 400 or so Likes that are the best indicators of the attributes they measured. I’ve offered a small selection below. Each of these things was Liked by at least 100 of their volunteers.
- High IQ: “Morgan Freemans Voice,” "The Colbert Report," “Curly Fries”; Low IQ: “Bret Michaels”
- Satisfied With Life: “Sarah Palin”; Dissatisfied: “Stewie Griffin,” “Science”
- Liberal & Artistic: “Dmt The Spirit Molecule” (the book by Rick Strassman); Conservative (personality, not politics): “I don’t read”
- Well Organized: “Law Officer,” “Accounting”; Spontaneous: “Join If Ur Fat”
- Extraverted: “Beerpong”; Introverted: “Minecraft”
- Cooperative: “The Book of Mormon” (presumably not the musical); Competitive: “Atheism/Satanism” (basically the same thing, right?), “I Hate You,” “I Hate Police,” “I Hate Everyone,” “Friedrich Nietzsche,” “Timmy South Park,” “Sun Tzu,” “Julius Caesar,” “Knives,” “Prada”
- Neurotic: “Kurt Donald Cobain”; Relaxed: “Parkour”
- Female: “Shoedazzle”; Male: “Dos Equis”
- Old: “Dr Mehmet Oz”; Young: “Dude Wait What”
- Many Friends: “The Dollar You Are Holding Could’ve Been In A Stripper’s Butt Crack”; Few Friends: “Walking With Your Friend & Randomly Pushing Them Into Someone/Something”
- Christian: “Gospel Music”; Muslim: “Desihits.com”
- Republican: “Glenn Beck”; Democrat: “Health Care Reform”
- Gay Man: “Wicked The Musical”; Straight Man: “Being Confused After Waking Up From Naps”
- Lesbian: “Sometimes I Just Lay In Bed And Think About Life”; Straight Woman: “Thinking Of Something And Laughing Alone,” “I Just Realized Immature Spells I’m Mature,” “Did You Get A Haircut No It Grew Shorter,” “Lipton Brisk”
- Black: “I Support My President,” “Madea”; White: “I Come From A Town Where A Traffic Jam Is 4 Cars Behind A Tractor,” “Bret Michaels”
- In a Relationship: “Scrapbooking”; Single: “Maria Sharapova”
- Alcohol Drinker: “Trying To Figure Out If Its A Cop Car”; Non-Drinker: “I Like Watching Raindrops Race Across My Window And Silently Cheer For Them”
- Drug User: “Austin Texas”; Non-User: “Milkshakes”
- Smoker: “Screwing Around In Walmart”; Non-Smoker: “That Spider Is More Scared Than U Are Oh Really Did It Tell U That”
- Parents Separated by 21: “You Need Anger Management Classes You Need Shut The Fukk Up Classes”; Parents Married: “Gene Wilder”