The psychology (and statistics) of perfect data

Is self-enhancement good for you?

Forgiveness is a good strategy - mostly.

Life is an exercise in experience sampling.

It's good to be back in Berlin and get yelled at.

I returned to Berlin as a Posthoc.

Use your words wisely.

Suppose you ask respondents to express their preference when presented with two hypothetical outcomes, earning a starting salary of $60,000 vs. earning a starting salary of $80,000. Presumably, everyone will prefer having more to having less, unless additional assumptions (such as drug use) are made. In such a study, Hsee & Abelson (1991) found that all of their 48 respondents preferred having more to having less, and they noted that this result was statistically significant (“

p< .001, using a chi-square test”). What hypothesis was being tested and rejected? We can reasonably assume that their null hypothesis was that respondents had no systematic preferences or that exactly half preferred having more to having less, which amounts to the same thing. Under that null hypothesis, the probability that all concur in their preference is .5E48, two-tailed, clearly a highly improbable result.You might say that the null hypothesis of choosing more instead of less with p = 0.5 in each individual case is an easily refuted straw man. If so, the finding of unanimity is a mere manipulation check. Hsee and Abelson seem to agree, as they noted that “this unequivocal finding not only supported the [absolute value effect] but suggested that subjects were not likely to have responded to the questionnaire haphazardly.” The specter of the straw man raises the question of how a different result would have been interpreted, or how the test could have been conducted differently. Regarding the first question, we note that a majority of 32 out of 48 agreeing on a preference would drive the p value below .05. Ask yourself, How would I feel if 32 of my 48 respondents preferred a higher to a lower starting salary? Would I trust the data? Wouldn’t it be odd to learn that 16 (one third) preferred less to more?

If this hypothetical result sheds doubt on the wisdom of using this significance test, you might ask yourself the second question: What test should I perform instead? Given my confidence that rational people will prefer more of a good thing to less, I could treat the 100 percent preference for more as the null hypothesis. Finding 100 percent agreement would corroborate this hypothesis. It would do so without the use of a significance test. How then would I respond to finding that one individual preferred less to more? For such a case, Cohen (1994) wryly asked, "Would you believe it if someone wanted to know which test to use?" Cohen must have thought it obvious that this was no case for significance testing. I agree with him (see my post What Cohen Meant). A single violation refutes the assumption of unanimity.

Had Hsee and Abelson found a preference for more in at least 32 of their respondents and in at most 47, both, the hypothesis of chance and the hypothesis of unanimity, would have been rejected. What would be left is a descriptive statement of the empirical proportion, its standard error (which depends on the sample size), and an informed, non-statistical judgment as to whether the observed result is a high or a low value. Bayesian analyses can push things further by providing ratios that quantify the relative support for various hypotheses (where the hypothesis of chance agreement and perfect agreement are two extreme examples).

Let us return to the point that in the Hsee and Abelson study, the finding of 48 out of 48 preferring more money to less requires no significance testing. Reporting test statistics and probability values has a pragmatic dimension. The reporting suggests that added value is being communicated. Pragmatic communication gives as much information as necessary and as little as possible. In the Hsee case the reporting of the p value and the fact that a chi square test was used is uninformative. It adds nothing to what your eyeballs tells you. You might counter by saying that had the sample consisted of four or fewer respondents, even unanimity would not have pushed

pbelow .05. But had there been so few respondents, the study would have been poorly designed. The non-rejection of the null hypothesis would have been guaranteed. Assuming that the study’s sample was large enough to allow meaningful testing, finding a perfect result obviates the need for such testing. This may seem ironic, but it is simply a pleasant fact.Cohen, J. (1994). The earth is round (p < .05).

American Psychologist, 49,997-1003.Hsee, C., & Abelson, R. P. (1991). Velocity relation: Satisfaction as a function of the first derivative of outcome over time.

Journal of Personality and Social Psychology, 60, 341-347.