Written with Leonard Chen


Often, human opinions, judgments, and estimates amount to error-prone measurements. Simple math dictates that when people are attuned to reality at all, then averaging their judgments will yield an estimate that is more accurate than its individual components are on average. Galton (1907) illustrated this simple but profound principle by collecting estimates of the weight of an ox at a country fair and showing that the average estimate was more accurate than the individual averages on average. Over the years, a number of replications of this finding were published, showing that the content domain does not matter much. Judging the temperature in a room has been a favorite, or estimating the year of some historical event (cf. Larrick, Mannes, & Soll, 2012). Robyn Dawes (1977) once tongue-in-cheekishly suggested that a person’s height could be measured by averaging judgments obtained from visual inspection. The result is the same. At worst, averaging does not yield an improvement, but often it does. So why not do it?

Recently, we have learned that individual people can take advantage of the averaging principle by producing more than one estimate of whatever it is they are estimating. Vul & Pashler (2008) showed that averaging a person’s first and second estimates yields an advantage. Herzog & Hertwig (2009) replicated this result and showed that the accuracy increment can be made larger by urging individuals to actively think differently when generating their second estimate. In an earlier post, we took a look at their data and some of our own to show how this works.

Today we offer a small replication of the averaging effect in the simplest of circumstances: estimating the number of M&Ms in a transparent container. We put 1,379 of those colorful beans in a jar. We know this number because we counted the beans. 16 students in our psychology class made an estimate of that number from simple visual inspection and guessing. We then gave them the Oliver Cromwell exhortation to “consider, in the bowels of Christ, that they might be mistaken,” and to guess again. They did, and we averaged the two estimates for each person.

Here are the results. The first estimates had a mean of 880 and a standard deviation of 641. That is, on average an estimate was 641 points away from the mean. For the second estimate, the respective numbers were M = 793 and SD = 478. For the averages of the first and the second estimate the numbers were M = 834 and SD = 545. Notice that the difference between the true value and the mean of the first estimates is 1379 – 880 = 499. In contrast, the difference between the true value and an individual’s first estimate is 746 on average. There you have the traditional wisdom of the crowd effect. The average of the judgments is more accurate than individual judgments are on average. In standard units, the size of this effect is .85, which is large. We calculated this effect size by subtracting the value 499 from each individual error, averaging these differences and dividing the result by the standard deviation.

How does the within-the-head wisdom of the crowd compare? This effect is smaller, to wit, .34 standard units. We calculated this effect size by averaging the differences between the error obtained with the first estimate and the error obtained with the average of the first and second estimate, and dividing this average by its corresponding standard deviation. We also ran a t test for statistical significance. This did not force a rejection of the null hypothesis, but we still sleep at night. The sample size was small, and nonetheless, there was an effect of the expected size and sign.

If this works, why not do it? What’s keeping you from giving yourself a second opinion, average it with the first, and harvest the benefits? The answer is that the method, though simple, seems weird to most people. The idea that one’s first estimate of X contains some error is not too hard to swallow, but the idea that some portion of this error is random, is. When people make an estimate of X (e.g., the number of lovers Aunt Polly had in her day) they make it with the conviction that the estimate they produce is the best they can do. This must be so by definition – were it not so, people would have come up with a different estimate in the first place. Vul, Herzog and collaborators have shown that people are not quite aware of the fluctuations in their own judgments (Vul), or that if they are, they still fail to comprehend that some of the variance is random (Herzog). One can elicit multiple judgments from individuals, but one may not expect people to do it on their own.

The willingness and ability to generate different opinions, judgments, estimates bear the stamp of creativity. To be creative, judgments must break out of a mold, a mindset. If initial judgments are the box, then boldly discrepant second judgments are outside of it. Intuitively, many people might feel that going against one’s own initial judgment is irrational and irresponsibly risk seeking. In fact, the opposite is true. As aggregation increases accuracy, so-called “correspondence rationality” is enhanced and the risk of being wrong is reduced.

As the lessons of Galton, Dawes, Herzog and others are beginning to sink in, we may be seeing interesting new applications. Consider the wisdom of the crowd in the context of goals: How many M&Ms would you like to eat (how many square feet should your house be; how many lovers would you like to love; how many mountains do you want to climb?). Are you sure your first estimate is the end of wisdom? Guess again and split the difference. There may not be an accuracy benefit waiting to be calculated, but perhaps there is an adaptiveness benefit. Again: how many children do you want to have? Now seriously, how many children do you want to have?

Dawes, R. M. (1977). Suppose we measured height with rating scales instead of rulers. Applied Psychological Measurement, 1, 267-273.

Galton, F. (1907). Vox populi. Nature, 75, 450-451.

Herzog, S. M., & Hertwig, R. (2009). The wisdom of many in one mind: Improving individual judgments with dialectical bootstrapping. Psychological Science, 20, 231–237.

Larrick, R. P., Mannes, A. E., & Soll, J. B. (2012). The social psychology of the wisdom of crowds. In J. I. Krueger (ed.), Social judgment and decision making (pp. 227-242). New York: Psychology Press.

Vul, E., & Pashler, H. (2008). Measuring the crowd within: Probabilistic representations within individuals. Psychological Science, 19, 645–647.

You are reading

One Among Many

A Game of Lunch and Love

The Duero Dilemma is a model of shyness.

The Art of War, Theban Style

Epaminondas took the Spartans by surprise. Surprise!

How Not to Believe

Nelson Mandela deserves better, and so does Paul Feyerabend.