Social Problems and Human Cognition
The category-expansion effect revisited
Posted Jul 16, 2018
A little rhetoric is a dangerous thing. ~ Rhetoclitus of Rhodes
The analysis of social problems is part of the portfolio of sociology. Some psychologists, and social psychologists in particular, specialize in demonstrating flawed perceptions and judgments among research participants, while asserting that such psychological failures contribute to our understanding of the social world and its discontents. A common strategy is to diagnose and demonstrate the existence of a new and problematic phenomenon that sheds new shadow on an already gloomy outlook. Some Harvard researchers have perfected this approach. A new paper in Science Magazine gives a nice illustration (Levari et al., 2018, hereafter LEA).
Following Steven Pinker (2018), another Harvard professor, LEA assert that the world is getting better, while, according to a YouGov Survey, most people think things are getting worse. But this need not be a contradiction because who knows what people had in mind when reporting what they thought ‘all things considered.’ Perhaps they were thinking of climate change. LEA take this contradiction at face value and set out to explain it in terms of human error. They seek to show, in a small experimental window, that humans perceive decline in the face of progress.
The psychological culprit is sensitivity to context. LEA report – correctly – that “psychologists have long known that stimuli are judged in the context of the other relevant stimuli that surround them in space or precede them in time.” Seamlessly, context-sensitivity becomes the mark of intellectual and moral failure. LEA observe that when instances of aggression, as traditionally defined, become rarer, observers expand the category of what counts as aggression (e.g., asking people where they are from). Such category expansion “may lead observers to mistakenly conclude that the prevalence of aggression has not declined.”
LEA report the results of 7 studies. I briefly describe the first because the experimental design is no different for the others. The 21 subjects [SS] classified each of 1,000 dots as blue or purple. The dots were evenly distributed over the color spectrum so that arguably, but not necessarily, the ‘objective’ boundary between the two categorical color terms lay midway. Notice that this is an assumption or a convention, but not a feature of nature. In one [control] condition, the distribution of pre-tested spectrum values was constant over all trials, whereas in the other [experimental] condition, the percentage of dots on the blue side of the spectrum decreased after the first 200 trials. By trial number 351, the probability of sampling a blue dot was a mere .06, that is, SS were roped into a routine of pressing the purple bar.
LEA plot the percentage of times that a dot of a given spectral value is classified as blue. In the control condition, the best-fitting line is ogival, suggesting that SS accepted the half-range of the spectrum as the best boundary between blue and purple. In the experimental condition, the curve is slightly but clearly displaced to the left, meaning that some dots were judged blue that were not so judged in the control condition. In other words, the psychological category boundary between blue and purple shifted towards purple. This does not mean, however, that SS perceived an increase in the prevalence of a target (blue) when there was in fact a decrease. In the published figure (not reproduced here), each point stands for a type of target dot with a particular spectrum value. Whereas in the control condition, all points reflect a similar number of judgments, the critical points in the experimental condition (i.e., those referring to the now rare blues), aggregate over very few observations.
The shift in the category boundary does not amount to a subjective reversal of an objective trend, as LEA imply in parts of their narrative. Yet, why are some purplish dots now labeled blue – besides SS’ being bored with pressing the purple button 600 times in a row? Recall that LEA gave a nod to the long history of research on context effects in judgment. Unfortunately, they did not make use of this rich and deep history. As it turns out, range-frequency theory (RFT; Parducci, 1965) predicts LEA’s findings without questioning people’s motives or intelligence.
Allen Parducci found that judgments of individual targets, events, or stimuli depend in part on the shape of the distribution of these targets. Individual judgments are a compromise between principles of range and rank. The range principle says that a target’s perceived value (size, magnitude, beauty etc.) is equal to its distance from the minimum value relative to the total range. The rank principle says that a judgment is equal to the proportion of targets with a lower value. When the frequency distribution of a set of stimuli ($ amounts, faces, colored dots) is symmetrical, the range and the rank principles yield the same result. Things get interesting when the distributions are skewed. If the distribution is left-skewed, that is, if there are few low values, the median (50% of the values are lower than this target) is higher than the mid-range; if the distribution is right-skewed, the opposite is the case. RFT predicts, given copious empirical findings, that judgments are a compromise between range and rank effects. The typical demonstration compares judgments of stimuli drawn from right- with left-skewed distributions with the same arithmetic mean. Overall, judgments are higher when the skew is to the left than when it is to the right. In Parducci’s hands, this effect is a testament to the powers of memory, sensitivity, and good judgment. There is no claim that humans should ignore the distribution of their experience and be responsive only to a single external, and objective standard (see Felin, Koenderink, & Krueger, 2017, for a review and critique of the ideology of the ‘all-seeing-eye’).
How might we apply RFT to LEA’s data? The range of the two distributions are the same, but the medians are different. The median purplishness is higher in the experimental condition than in the control condition. SS would have judged the same stimuli as being more purplish when blue dots were rare. So why would Ss draw the category boundary closer to the purple end when there were few blues? RTF suggests that the overall median is a candidate category boundary, but its use is attenuated by the range effect. As a result, the perceptual boundary lies between the mid-range and the median. This is what LEA found.
Moving dots and colors to socially and politically relevant material, LEA show category expansion with faces varying in ‘threateningness’ and with research proposals varying in ethicality. When threatening faces and unethical proposal become rarer, the boundary between good and bad shifts toward good to include a few more of the bad. There is no indication of a reversal such that more proposals were judged unethical in the skewed than in the even distribution (or that more faces were seen as threatening). In fact, it appears that more individual proposals in the ambiguous range were seen as more ethical (individual faces were seen as less threatening).
There is another aspect to this study worth mentioning. LEA's design compares judgments in a stable and in a decreasing probability condition. They did not run a condition in which the critical targets were rare throughout. RTF (as well as other theories of judgment; Fiedler & Krueger, 2012) predicts that in such a condition, the prevalence of rare targets will be overestimated too, in which case there is no unique threat raised by changing distributions; a lesson my mentors helped me appreciate in graduate school (Krueger, Rothbart & Sriram, 1989).
Into the weeds
Although RTF fits the data well, a direct test is not possible because subjects made no quantitative judgments (e.g., of blueness) about the stimuli. RTF assumes that raters have a more or less accurate representation of the distribution of the stimulus values, that is, its range and skew. A simpler possibility is that people just remember the last few stimuli they have categorized. Robert Wilson of the University of Arizona has reanalyzed the LEA data and found evidence for a switching bias (see here). The probability of calling a dot blue was higher if it was preceded by a dot called purple than when it was preceded by a dot called blue. When there are only few blue dots, this bias entails the category expansion effect unless the tendency of switching from blue to purple is much stronger than the tendency of switching from purple to blue. Interestingly, this switching bias was also observed when there were as many blues as purples. Here, we might interpret this bias as an instance of the gambler's fallacy (Croson & Sundali, 2005). After so many blues, they think, a purple dot is due.
Croson, R., & Sundali, J. (2005). The gambler’s fallacy and the hot hand: Empirical data from casinos Journal of Risk and Uncertainty, 30, 195-209.
Felin, T., Koenderink, J., & Krueger, J. I. (2017). Rationality, perception, and the all-seeing eye. Psychonomic Bulletin & Review, 24, 1040-1059.
Fiedler, K., & Krueger, J. I. (2012). More than an artifact: Regression as a theoretical construct. In J. I. Krueger (Ed.). Social judgment and decision-making (pp. 171-189). New York, NY: Psychology Press.
Krueger, J., Rothbart, M., & Sriram, N. (1989). Category learning and change: Differences in sensitivity to information that enhances or reduces intercategory distinctions. Journal of Personality and Social Psychology, 56, 866-875.
Levari, D. E., Gilbert, D. T., Wilson, T. D., Sievers, B., Amodio, D. M., & Wheatley, T. (2018). Prevalence-induced concept change in human judgment. Science, 360, 1465-1467.
Parducci, A. (1965). Category judgment: A range-frequency model. Psychological Review, 72, 407-418.
Pinker, S. (2018). Enlightenment now. New York: Viking.