The Implicit Assumptions Test
Does the IAT measure what proponents claim it does?
Posted Feb 07, 2015
Let’s say you have a pet cause to which you want to draw attention and support. There are a number of ways you might go about trying to do so, honesty being perhaps the most common initial policy. While your initial campaign is met with a modest level of success, you’d like to grow your brand, so to speak. As you start researching how other causes draw attention to themselves, you notice an obvious trend: big problems tend to get more support than smaller ones: that medical condition affecting 1-in-4 people is much different than one affecting 1-in-10,0000. Though you realize it sounds a bit perverse, if you could somehow make your pet problem a much bigger one than it actually is – or at least seem like it is – you would likely attract more attention and funding. There’s only one problem standing in your way: reality. When most people tell you that your problem isn’t much of one, you’re kind of out of luck. Or are you? What if you could convince others that what people are telling you isn’t quite right? Maybe they think your problem isn’t much of one but, if their reports can’t be trusted, now you have more leeway to make claims about the scope of your issue.
This brings us once again to the matter of the implicit association task, or IAT. According to it’s creators, the IAT “…measures attitudes and beliefs that people may be unwilling or unable to report,” making that jump from “association” to “attitudes” in a timely fashion. This kind of test could serve a valuable end for the fundraiser in the above example, as it could potentially increase the perceived scope of your problem. Not finding enough people who are explicitly racist to make your case that the topic should be getting more attention than it currently is? Well, that could be because racism is, by in large, a socially-undesirable trait to display and, accordingly, many people don’t want to openly say they’re a racist even if they hold some racial biases. If you had a test that could plausibly be interpreted as saying that people hold attitudes they explicitly deny, you could talk about how racism is much more common than it seems to be.
This depends on how one interprets the test, though: all the IAT measures is very-fast and immediate reaction times when it comes to pushing buttons. I’ve discussed the IAT on a few occasions: first with regard to what precisely the IAT is (and might not be) measuring and, more recently, with respect to whether IAT-like tests that use response times as measures of racial bias are actually predicting anything when it comes to actual behaviors. The quick version of both of those posts is that we ought to be careful about drawing a connection between measures of reaction time in a lab to racial biases in the real world that cause widespread discrimination. In the case of shooting decisions, for instance, a more realistic task in which participants were using a simulation with a gun instead of just pressing buttons at a computer resulted in the opposite pattern of results that many IAT tests would predict: participants were actually slower to shoot black suspects and more likely to shoot unarmed white suspects. It’s not enough to just assume that, “of course this different reaction times translate into real world discrimination”; you need to demonstrate it first.
This brings us to a recent meta-analysis of some IAT experiments by Oswald et al (2014) examining how well the IAT did at predicting behaviors, and whether it was substantially better than the explicit measures being used in those experiments. There was, apparently, a previous meta-analysis of IAT research that did find such things – at least for certain, socially-sensitive topics – and this new meta-analysis seems to be a response to the former one. Oswald et al (2014) begin by noting that the results of IAT research has been brought out of the lab into practical applications in law and politics; a matter that would be more than a little concerning if the IAT actually wasn’t measuring what it’s interpreted by many to be measuring, such as evidence of discrimination in the real world. They go on to suggest that the previous meta-analysis of IAT effects lacked a degree of analytic and methodological validity that they hope their new analysis would address.
For example, the authors were interested in examining whether various experimental definitions of discrimination were differentially predicted by the IAT and explicit measures, whereas they had previously all been lumped into the same category by the last analysis. Oswald et al (2014) grouped these operationalizations of discrimination into six categories: (1) measured brain activity, which is a rather vague and open-to-interpretation category, (2) response times in other tasks, (3), microbehavior, like posture or expression of emotions, (4), interpersonal behavior, like whether one cooperates in a prisoner’s dilemma, (5) person perception, (i.e., explicit judgments of others), and (6) political preferences, such as whether one supports policies that benefit certain racial groups or not. Oswald et al (2014) also added in some additional, more recent studies that the previous meta-analysis did not include.
While this is a lot to this paper, I wanted to skip ahead to discussing a certain set of results. The first of these results is that, in most cases, IAT scores correlated very weakly to the discrimination criterion being assessed, averaging a meager correlation of 0.14.To the extent that IAT is actually measuring implicit attitudes, those attitudes don’t seem to have much a predictable affect on behavior. The exception to this pattern was in regard to the brain activity studies: that correlation was substantially higher (around a 0.4). However, as brain activity per se is not a terribly meaningful variable when it comes to its interpretation, whether that tells us anything of interest about discrimination is an open question. Indeed, in the previous post I mentioned, the authors also observed an effect for brain activity, but it did not mean people were biased toward shooting black people; quite the opposite, in fact.
The second finding I would like to mention is that, in most cases, the explicit measures of attitudes toward other races being used by researchers (like this one or this one) were also very weakly correlated to the discrimination criterion being assess, though their average correlation was about the same size as the implicit measures at 0.12. Further, this value is apparently substantially below the value achieved by other measures of explicit attitudes, leading the authors to suggest that researchers really ought to think more deeply about what explicit measures they’re using. Indeed, when you’re asking questions about “symbolic racism” or “modern racism”, one might wonder why you’re not just asking about “racism”. The answer, as far as I can tell, is because, proportionately, very few people – and perhaps even fewer undergraduates; the population most often being assessed – actually express openly racist views. If you want to find much racism as a researcher, then, you have to dig deeper and kind of squint a little.
The third finding is that the above two measures – implicit and explicit – really didn’t correlate with each other very well either, averaging only a correlation of 0.14. As Oswald et al (2014) put it:
“These findings collectively indicate, at least for the race domain…that implicit and explicit measures tap into different psychological constructs—none of which may have much influence on behavior…”
In fact, the authors estimate that the implicit and explicit measures collectively accounted for about 2.5% of the variance in discriminatory criterion behaviors concerning race, which each adding about a percent or so over and beyond the other measure. In other words, these effects are small – very small – and do a rather poor job of predicting much of anything.
We’re left with a rather unflattering picture of research in this domain. The explicit measures of racial attitudes don’t seem to do very well at predicting behaviors, perhaps owing to the nature of the questions being asked. For instance, in the symbolic racism scale, the answer one provides to questions like, “How much discrimination against blacks do you feel there is in the United States today, limiting their chances to get ahead?” could have quite a bit to do with matters that have little, if anything, to do with racial prejudice. Sure, certain answers might sound racist if you believe there is an easy answer to that question and anyone who disagrees must be evil and biased, but for those who haven’t already drank that particular batch of kool-aid, some reservations might remain. Using the implicit reaction times also seems to blur the line between actually measuring racist attitudes and many other things, such as whether one holds a stereotype or whether one is aware of a stereotype (foregoing the matter of its accuracy for the moment). These reservations appear to be reflected in how very bad both methods seem to be at predicting much of anything.
So why do (some) people like the IAT so much even if it predicts so little? My guess, again, is that a lot of it’s appeal flows from its ability to provide researchers and laypeople alike with a plausible-sounding story to tell others about how bad a problem is in order to draw more support to their cause. It provides cover for one’s inability to explicitly find what you’re looking for – such as many people voicing opinions of racial superiority – and allows a much vaguer measure to stand in for it instead. Since more people fit that vaguer definition, the result is a more intimidating sounding problem; whether it corresponds to reality can be besides the point if it’s useful.
References: Oswald, F., Blanton, H., Mitchell, G., Jaccard, J., & Tetlock, P. (2014). Predicting racial and ethnic discrimination: A meta-analysis of IAT criterion studies. Journal of Personality & Social Psychology, 105, 171-192.