Thomas J. Leeper

Thomas J. Leeper PhD


Lies, Damned Lies, and Genetics Statistics

Correlation is not causation. In genetics, all we get are correlations.

Posted Apr 25, 2012

This is the third of three posts reflecting on the rise of genetic, evolutionary, and biological approaches to the study of politics. Today’s post focuses on skepticism among political scientists about the evidence for genetic bases of political phenomena and on the challenges for genetic evidence to satisfy the criteria for being seen as causes as opposed to correlates of those phenomena. 

While genetic explanations of social and political behavior necessarily tip toward the ‘nature’ side of the ‘nature versus nurture’ debate thus raising profound philosophical questions about genetic determinism and an absence of self-control, the most important critiques of “genopolitics” research are methodological rather than substantive. Genetic research shares the burden with all other theories of establishing causal, as opposed to correlational, relationships between genes and political opinions.

Unfortunately, humans are quite poor at causal reasoning. We have trouble distinguishing cause from correlation because we are not taught how to tell the difference, other than repeating the mantra that one is not the other. We tend to see causality largely in terms of temporal ordering: things that come before cause things that come after, regardless of whether they are actually related and whether one in fact caused the other as opposed to being products of a third antecedent factor.

Why is this problematic? We can begin with a little background on causation. In a widely cited 1986 article (gated, ungated), Paul Holland summarized the most widely adopted model for thinking about causation, which was developed earlier by philosophers, like David Hume and J.S. Mill, and statisticians, like R.A. Fischer, Jerzy Neyman, and Donald Rubin. The basic idea is that to understand the “causal effect” of one factor on an outcome, we need to think “counterfactually.” That is, we can only observe an outcome (i.e., someone’s political ideology today) that results from a single iteration of the universe. To know the effect of some factor that preceded today, we need to know what their outcome (i.e., their ideology) would have been had everything that preceded today stayed the same except for the absence of that one factor. We can say that a factor caused someone’s ideology to the extent that their ideology would have been different if, all else equal, they instead had been exposed to a different value of that factor.

The challenge emerges because knowing what that counterfactual outcome would be is impossible at the individual level. We can only observe one iteration of the universe and can never know what the effect of any particular factor was on a given individual. Some statisticians, like Rubin, go so far as to say “no causation without manipulation,” meaning that we can never see causation at all unless we can manipulate the putatively causal factor in an experiment. The reason for Rubin’s bold claim is that unless the researcher has manipulated a given factor, any apparent relationship between one factor and an outcome may mask other unobserved factors that also cause the outcome and confound our inference.

When thinking about genetics, this means that we can never know for sure what the effect of a given gene is on an individual’s opinions or ideology. To know causation therefore requires group-level observation: We can only see the effect of a given cause (i.e., the difference between) groups of people that have one gene compared to people who lack that gene but are otherwise similar. Yet, despite the apparent simplicity of that comparison, genetic testing is rare, it is difficult if not impossible to isolate genes that are meaningful on their own (as opposed to having impacts in combination with other genes), and genetic research on social and political behavior rarely looks directly at genes. Instead, “biopolitics” approaches involve physiological metrics (e.g., testosterone production or the 2D:4D Ratio, both which seems to predict aggressive behavior in simulated military conflicts (here and here) and/or predictions drawn from biologically motivated theory. That is, research frequently attributes causation to genetics without directly observing (or manipulating) genes.

Indeed, the highly publicized study by John Hibbing and collaborators that I have discussed in my previous posts this week relied on one of the most suspicious empirical methods in use by genetics researchers: the twin study. In twin studies, fraternal and identical twins are compared on various outcomes and differences found to be larger between fraternal twins than identical twins are taken as evidence of causal effects of genetics because identical twins are (as the term implies) genetically identical.

Twin studies are not as elegant as they appear. In particular, twins are different from “singletons” simply because they are twins. Twins experience different hormonal environments in utero than singletons (as is evidenced in discussions of the 2D:4D Ratio) and, as any twin (or other multiple) can likely attest, they are raised differently from singletons—and identical twins may even be raised and socialized differently from fraternal twins. Consequently, twins do not serve as a particularly representative sample for understanding genetic and environmental effects on the whole of the population (over 96 percent of which are born as singletons). And, the differences between sets of twins do not say anything particularly specific about the genetic causes of anything. Twin studies do not help to identify which genes cause what, only that environmental factors are not likely to be causes. Twin studies cannot reveal any genetic cause, let alone a “liberal” or “conservative” gene. (Hibbing and coauthors have recently agreed that twin studies are limited, but used alternative methods to continue to show that 40 - 60 percent of political ideology is heritable.

These concerns fully undermine twin studies. But methodological rivals to twin studies raise additional concerns, which were elegantly discussed in a recent American Political Science Review article by Evan Charney and William English (gated; ungated), in response to an earlier article by James Fowler and Christopher Daws (UC-San Diego). (See here for some additional commentary).

Their focus is on “candidate genes” studies—research on specific genes thought to cause a particular political outcome, analogous to how the BRCA gene is known to cause breast cancer. The hunt for “liberal” or “conservative” genes—or genes that predict more general (but still politically relevant) phenomena, like aggressiveness—is fodder for media hype, but not necessarily rigorous (social) science.

While Charney and English raise a number of critiques, among the most important is that “genetic” research that relies on correlations between genes and political or social outcomes (e.g., opinions or behavior) has not established that genes are the cause of those outcomes. They summarize their findings toward this end in a nice, long table, which outlines how four commonly studied genes predict everything from voting to contraception use to epilepsy. While these genes may cause all of these outcomes, more likely than not all of these relationships are spurious and not evidence of causation. Simple correlations between a given gene and political outcomes seem to be perfect examples of how correlations mask unobserved causal relationships—the kind of thing Rubin would be particularly suspicious of.

While we may long to find a gene (or two, or three, etc.) that predict one’s political ideology and allow us to trace the heritability of liberalism and conservatism from parent to child, the fact of the matter is that we haven’t found that gene and no published research has found anything remotely close to it. Headlines saying that there are genetic differences between liberals and conservatives or that genes predict our political behavior are often more media hype than rigorous evidence. To the extent that we still rely on twin studies and the subset of readily identifiable “candidate genes,” political science does not yet have the tools to identify causal relationships between any particular gene and any political outcome. Even if promising genes are found, the public should be cautious of whether a study has really shown a causal relationship. And because the dimensions on which political ideology is organized change over time (as one astute reader pointed out earlier this week), what seems to predict liberalism today may predict conservatism, or nothing, two decades from now.

Is your political ideology a result of your genetics? A good question, but one we don’t have a good answer to.

About the Author

Thomas J. Leeper

Thomas J. Leeper is a Ph.D. candidate in political science and a graduate fellow of the Institute for Policy Research at Northwestern University.

More Posts