Spurious Correlations by Tyler Vigen: A Book Review

Public school enrollment and sour cream consumption are highly correlated.

Posted Aug 04, 2015

Tyler Vigen’s book, Spurious Correlations, is warm, funny and makes several very important points. According to Vigen, his book is based on dozens of correlations between completely unrelated sets of data. He relied on a computer to generate random Pearson product-moment correlations (r) between such meaningless variables as public high school enrollment and sour cream consumption. And in fact, the correlation between public high school enrollment and sour cream consumption is quite high, r = .95. Not only is this random and meaningless, but the larger problem is that we see nonsense like this every day and people base conclusions on these correlations. For example, I am concerned about school enrollment. Does this high correlation mean that if I eat more sour cream more kids will stay in school?

Vigen’s book is a lot of fun because he has nearly two hundred of these silly, random correlations which are derived from serious data bases. For example, when data from the Center for Disease Control, CDC, is correlated with data from the Internet Movie Database, he found that Ben Affleck film appearances have a very high correlation with accidental poisonings by pesticides, r= .92. Does this mean that Ben Affleck films cause accidental poisonings by pesticides? Of course not. As every undergraduate psychology major knows, correlation does not imply causation. A correlation is simply a mathematical relation between two data sets. It means that two variables go together or covary.

While fun and silly, this book demonstrates many important principles.  Along with 1) be cautious in your data interpretation and 2) correlation does not imply causation is the third concept of spurious correlation. In fact, Vigen’s book is titled Spurious Correlations. Strictly speaking, a spurious correlation is when the relationship between variables with a strong correlation is explained by a third variable. This is where Vigen’s book gets even more interesting. Here is another example. March Madness TV ad revenue and Breweries in the United States correlate .94. So as ad revenue increases so do breweries. Could they both be explained by a booming economy? A better economy leads to more money to spend on everything, including TV ads and breweries. And this suggests another social science principle, The Law of Parsimony. The Law of Parsimony holds that when things are ambiguous, the simplest explanation that explains the most observations is the best.

Hmmm...., now things are getting complex. It is not enough to observe a correlational relationship between variables and jump to a conclusion. Unfortunately, this happens all the time and this is why this book is such a great adjunct to a formal class in statistics. Anyway, it starts to become obvious that social science is about reasoning, logic and not just random computer generated correlations. We use deductive reasoning to form hypotheses, inductive reasoning to test the hypotheses, and carefully replicate our findings before jumping to conclusions. Social science research is fundamentally an exercise in logic. Unfortunately, in the age of big data this is not happening enough. Daily, we are overwhelmed with data. I cannot even eat a chocolate from Sees without knowing how many calories it will cost me. Scientists race to publish findings and negative results do not even get published.  Media and teachers grasp at the quickest conclusion and spread it around like gossip or children playing telephone. Everything happens very rapidly without much critical thought or examination. And, this is exactly why Vigen’s book is so important. By poking fun at meaningless correlations, he calls attention to sloppy thinking. Read this book for the fun of it and then stop and think about the implications for all the meaningless conclusions we form every day.