Most of us care about privacy on the Internet—we don’t want the personal information we place online to be spread without our knowledge or consent. But with clever algorithms or even just a bit of statistics, intimate secrets that we don’t express at all can be surmised by the preferences we do express. A paper published online today in the Proceedings of the National Academy of Sciences (PNAS) shows just how revealing Facebook Likes can be.
Previous research has shown that basic demographic information (such as age, gender, and education level) and elements of personality can be guessed by looking at online profiles and browser histories. And Charles Duhigg reported in the New York Times Magazine last year that Target can figure out when its customers are pregnant based on subtle changes in shopping habits (unscented lotion and dietary supplements are clues), and the company outed one teenage girl to her father by mailing her crib coupons. But this new study, by Michal Kosinski and David Stillwell of the University of Cambridge and Thore Graepel of Microsoft Research, goes much further in demonstrating the potential of user profiling, documenting clues for intelligence, sexual orientation, and drug use.
The study was pretty simple: 58,000 volunteers shared their Facebook profiles and filled out additional surveys. The researchers looked for correlations between Likes and other attributes. That’s it. Below is the accuracy with which they could predict certain dichotomous attributes based on Likes. (Accuracy here is defined as the probability of correctly categorizing two randomly selected people, one from each of the two paired categories—e.g., male and female.)
The researchers could also make decent guesses at non-dichotomous attributes. Below are the correlation strengths between their predictions and the actual values. A correlation coefficient above 0.3 is considered moderate, and one above 0.5 is strong.
The test-retest reliability for openness (the correlation between the two scores one would get if one took the test twice) is 0.50, pretty close to 0.43, so for that trait, looking at someone’s Likes is almost as good as looking at their actual personality test score.
The volunteers had a median number of 68 Likes (ranging from 1 to 700), but the researchers found that knowing just one random Like from a person usually gave them a better-than-chance guess at the person’s gender, age, and openness. And the connections aren’t always obvious. For instance, the researchers point out, less than 5% of the gay volunteers Liked explicitly gay groups, but Liking Desperate Housewives was kind of a giveaway. (Okay, that one is maybe a bit obvious.)
Stillwell wrote me in an email:
My favourite part is that whereas previous researchers have linked behaviour online with personal traits, Facebook Likes have a meaning that we can use to understand the psychology behind what people do. For example, in the supplemental table for "parents separated at 21" a couple of the most predictive Likes of parental separation are for the Likes: "I'm sorry I love you" and "If I'm with you then I'm with you I don't want anybody else". So although our prediction is not very good (60%, which is just above chance at 50%), it gives us a poignant insight into the effects that parental breakup has on children even after they grow up. It was surprising to us that parental breakup has any effect at all on the things you choose to Like.
Certainly more attributes than those studied here could be detected using Facebook Likes, or using other online behavior. That could be great: targeted content, no more irrelevant ads. But the potential downside is also significant. The authors write: “Commercial companies, governmental institutions, or even one’s Facebook friends could use software to infer attributes such as intelligence, sexual orientation, or political views that an individual may not have intended to share. One can imagine situations in which such predictions, even if incorrect, could pose a threat to an individual’s well-being, freedom, or even life.”
I asked if the findings would change what the authors Like on Facebook. Stillwell replied:
I think it’s like deciding what to wear in the morning: We all know that choosing clothes is important because it gives others a certain impression about us. It’s the same with Likes. As long as you’re happy being associated with a certain Like, and you’re aware of who you’re sharing that with and what they might infer from that, then it’s an opportunity to control your image. We’re expecting a sudden influx of people liking "Curly Fries"!
Why curly fries? In supplementary online data, the researchers list the 400 or so Likes that are the best indicators of the attributes they measured. I’ve offered a small selection below. Each of these things was Liked by at least 100 of their volunteers.