What Your Social Media Use Says About You
A conversation with Christian Rudder, author of Dataclysm
Posted September 9, 2014
Christian Rudder, co-founder of the dating site OkCupid, has written a book called Dataclysm . He presents data in very creative ways to highlight various patterns. And the book is not just about dating. Rather, it’s about how large sources of data that you and I are contributing to by simply using our digital devices and social media sites on a regular basis, are revealing a lot of interesting things about people. For example, how we present ourselves is often not at all the same thing as what we are really like inside. I had the opportunity to talk with him, about everything from how the NSA recruits the smartest people from Harvard and other elite schools to how blind non theory driven algorithms when turned on huge sources of data can teach us about human nature.
JON: How do you think huge data sources from OkCupid, Facebook, Twitter, Redditt, Craigslist, and other places might be overturning the received wisdom in the social sciences? For example, you note that “almost all of its foundational ideas were established on small batches of college kids” and “the full truth of data is only revealed over a large sample.”
You talk about how it’s not as exciting when large samples replicate a finding, but I actually think that’s really really important.
I think it is very important; it just doesn’t have the same sizzle. You know just as well as I do that sometimes people have a finding and it’s just counterintuitive enough to make a press release. And validating something, even if it’s really important isn’t going to be as press release worthy. Either outcome, you’re getting to the truth of a question through hopefully more powerful means by using large samples. Whether that finding validates or contradicts the perceived wisdom, it’s almost equally important. I definitely don’t want it to seem that I’m saying the datasets I’m using in my book have overturned so much of what people have understood or that it’s going to throw out decades of established research because I think that would be unfair because the research by and large is probably correct in most instances.
I think the large samples will help weed out what is not correct and also solidify what is.
Yeah exactly. I think it offers a tremendous opportunity to refine our understanding. I definitely do not think that the existing body of social research which wasn’t done with this big data is somehow invalid because of that fact. I think it’s just weaker because of that fact and will be made stronger. And there will be questions that we haven’t been able to approach at all because of social desirability bias or lack of reach into certain populations.
You mention how a person’s “like” pattern on Facebook could even be used as a proxy for an intelligence or IQ score. How far away are we from developing something like that?
Well, I think we are still far away from saying with any real certainty how smart any one person is based on Facebook likes. In aggregate finding out that people who like X, Y, Z, have traits A, B, C, D, I think we’re already there. We’re already tackling life history questions based on Facebook likes. For example, did your parents get divorced before they were 21, they can unlock that with 60% certitude. Given that it’s only a few years’ worth of likes, imagine that it’s in five years and there’s that much more data to go on, and people are revealing other parts of their lives through their smartphones and their laptops. It’s in a good place now but it’s only going to get better.
You note that using a cut and paste strategy is roughly 75% effective as writing something original in securing a response from a dating party—a strategy mostly employed by guys. Why do you think templates are so successful and what are the implications of this on a broader level?
For me the most interesting part of that fact is that it shows how iteration, which is something companies or algorithms tend to do, works for people as well. That they do as well as they do even though they aren’t tailored at all to the people who receive them. And basically it’s a message that a person has refined and even though it’s not handcrafted he goes and blasts it out. And it’s important to realize that from the recipient’s perspective [in dating], they don’t know that so many other people have gotten that message, so to them it’s just as unique as if you had written it by hand. So what does it say about the process of online dating? Because so much can happen in parallel, whereas in a normal conversation people can only say something one at a time, it allows economies of scale and “business best practices” that can be applied by normal individuals. An interesting thing to think about in a general culture where people are acting more corporate, for example people’s idea of personal branding or being your own CEO, this phenomenon is part and parcel of that I think.
You write “People saying one thing and doing another is pretty much par for the course in social science.” In terms of data from OkCupid and other dating sites, what was the most surprising finding that highlights this principle?
The thing that surprised me the most was the discrepancy between what men will enter as their preferred age parameters for a female partner. So for example, say they are 38, they would typically enter something around 32 to 43, but then I would go and tell the site that 20 year old women are the hottest. And that’s static, that’s a fact, men who are 18, 28, 38, 48 all think that women in their 20s are the best looking. So there are really three parameters. There is who I think I’m attracted to. There’s who I indicate who I’m attracted to by voting. And then there are the people who I actually try to message and go out with and hit on. And that’s where some mix of pride, self-confidence, social propriety, all kind of mix in to kind of split the difference between the lower range of what I say I want and who I indicate I like through my vote. So if I’m 38 I say that I’m willing to date someone 32 but I indicate that 20 year olds are the hottest. But I’m messaging 27 year olds because that’s what is the youngest age I think I can get away with. And not to bring it all back to economics, but you can really see how the emotional forces play out in the numbers.
You point out that the power of large data is amplified once we are able to collect this data over many decades. What questions do you think such longitudinal studies will answer?
A person’s views and cultural tastes change as they get older. For example you usually like the music you like in high school for your whole life. My dad still listens to The Four Seasons and The Beach Boys and graduated in ‘64. That’s not exactly true for me, but pretty close. And this question is pretty hard to answer with the current datasets. Having some way to look at how people’s minds change—not biologically but psychologically—over time would be amazing, measuring their level of tastes and preferences. You could plot social views versus economic views, a typical path through life could be that you start off socially and economically liberal and when you get a job there’s some tension there, and then maybe when I’m older and no longer working I’ll become more economically liberal. And it would be cool to be able to trace that beyond anecdotes and beyond polls.
I feel like it could answer a lot of questions in human development.
Yeah exactly. There’s so much research on early childhood development and how a child becomes an adult, but there isn’t as much developmental stuff about how adults stay adults or get older especially because there isn’t a parent there reigning over every footstep once you’re 21. It would be incredibly interesting because people do keep changing in some ways and they tend to stay the same in others, and it would be amazing to know what those ways are, you know.
You talk about Math 55 at Harvard, and how places like the NSA recruited talent from this elite group. Could you expand on that?
The people at the NSA are actually recruited by the top tier of math talent. The best CS majors and the best math majors were all recruited by the government. I think there is this idea, probably because people like to talk about the stereotype of government workers, that even those working in a security capacity are somehow kind of bureaucratic, but the people in the NSA are certainly the hardest core of the hardcore, especially in the cryptography and technology realm. They recruited mathematicians with the big capital M, which I would never call myself at all even if people put that in a bio about me. There’s a big difference between being a math major and a mathematician.
You note the words that are representative of men and women across various racial/ethnic groups based on data from Google. You write: “Sometimes, it takes a blind algorithm to really get vision into the data.” Could you expand on what you found there, what surprised and intrigued you?
What I meant by the exact quote is that anyone can pick stereotypes, the white person and redneck one for example, but if you’re doing the stereotype you’re probably not of that group. So that’s the power of the algorithm, it just kicks out whatever is there not what we think it should be. With that you find white people are into country music and camping, but they also talk a lot about their eyes and hair, but that’s because white people do have different kinds of eyes and hair, whereas Asians and Latinos and Black people generally do not, their hair is usually all brown or black. Whereas for white people you get blonde, red, or whatever. Doing this algorithmically allows you take your prejudices out of it and paradoxically enriches what you get back out from the data.
© 2014 by Jonathan Wai