Replicating Distinction Bias: Joint vs. Separate Evaluations

Our journey in open science, replicating a classic phenomenon in decision-making

Posted Feb 22, 2021

This post was written by Wing Yiu Hung, who completed her undergraduate thesis under the supervision of Gilad Feldman with the Department of Psychology at the University of Hong Kong. She completed a replication and extension of Hsee and Zhang’s (2004) Distinction Bias. Below, she shares her experiences in conducting a replication study and some of her findings and insights. Gilad Feldman edited this post for Psychology Today.

I first learned about the "replication crisis” when Gilad suggested to me that I work on a replication and extension study for my final year thesis. At that time, I was confused about the need to replicate others' research. Why would I try and replicate a study when there has already been evidence to support the theory? But looking back, I am glad that I had a chance to work on such a meaningful project.

What is the "Replication Crisis"?

Replicability is one of the hallmarks of science. It helps assess whether novel findings likely reflect a real effect or may just be a false alarm (false positive), thereby building the credibility of research findings. However, in recent years, mass replication projects conducted by large collaborations of labs around the world, like Open Science Collaboration (2015) and Many Labs (1-5), found fewer than expected successful replications of highly cited studies in well-regarded journals. Growing evidence in both the social and hard sciences seems to suggest that there is a crisis in science and that we need to better examine our research practices and try to work together to reform and improve them.

There are inherent issues regarding our transparency in methods and procedures, a lack of sharing of datasets and code, reliance on small, underpowered samples, and questions regarding the generalizability of theory and evidence. In addition, there are increasing concerns that failures to replicate may also reflect common "questionable research practices (QRP)" in our literature (Feldman et al., 2019). These refer to the research decisions scholars make, consciously or subconsciously, which capitalize on the flexibility in research decision-making and increase the probability of showing support for an effect, even when it is not supported by the evidence. These include but are not limited to the selective reporting of methods and results and the selective exclusion of experimental conditions and participants from the analyses, which can make results appear as if they passed the current common threshold for publications of a paper (p < .05) (coined "p-hacking").

Replication Study on Distinction Bias

As part of my thesis, I conducted a replication of Hsee and Zhang’s (2004) experiments on distinction bias. It is a very interesting phenomenon in judgment and decision thinking. On the one hand, when people make decisions in their everyday lives, they are often in the joint evaluation (JE) mode, and they rely on comparisons to infer the value and desirability of the two options.

For example, people can easily compare an underfilled ice cream cup against an overfilled ice cream cup with less ice cream. They can then rely on information comparison regarding the amount of ice cream and its value, also considering whether the context of the size of the cup holding the ice cream is relevant for their evaluations (see Fig. 1). In such situations, they may infer that they would ignore the context and that their happiness would be only related to the amount of ice cream they have.

On the other hand, some real-life situations do not allow for easy comparison between options, such as when options are presented in separate evaluation (SE) mode. People then need to assess a single situation and cannot easily compare it with other alternatives.

For example, using the previous ice cream example, people would have a difficult time assessing the value of the amount of ice cream supplied, which might result in them relying on contextual cues, such as the size of the cup holding it and whether it was under- or overfilled. In such cases, people might actually show a preference for an overfilled cup, even when it has less ice cream.

If we contrast the two evaluation modes, people make comparisons when comparisons are available; yet it is possible that they would mispredict their assessments of the same options when they are presented separately.

Hsee (1998)
Source: Hsee (1998)

Distinction bias is the phenomenon that people overpredict the difference in happiness generated from comparing two different alternatives, which are quantitatively different in nature. For example, people in joint evaluation mode may overpredict the difference in happiness from having a 330 mL soft drink compared to having a 340 mL soft drink, thinking the latter would lead to higher happiness. However, when people are only given one of the soft drinks, they reported a similar level of happiness because the assessment of happiness is not informed by the comparison to alternatives.

In the replication study, we focused on Studies 1 and 2 in the classic article on distinction bias by Hsee and Zhang (2004).

In Study 1, participants in Joint Evaluation mode were asked to imagine that they wrote a poem book and were then asked to predict their happiness in four possible realities regarding how many people bought their poem book: no one, 80 people, 160 people, or 240 people. We then examined the predicted happiness contrasting Separate Evaluation and Joint Evaluation modes that were compared in qualitatively and quantitatively different scenarios (see Table 1).

In this case, qualitative difference is the difference between having not sold any books at all and having sold some books, whatever the number (no books sold versus 80 books sold). So, there is something about the difference between nothing to something that is qualitatively different.

A quantitative difference is a numerical difference within a qualitative category, and in this case, it would be the difference in the number of books sold (80 versus 160 versus 240 books sold).

Consistent with the original finding as shown in Figure 2, we found that participants in joint evaluation mode overpredicted the happiness among quantitatively different situations (80 vs. 160 vs. 240), but not between qualitatively different situations (nothing versus something). Whereas participants in joint evaluation mode predicted that they would be significantly happier the more people bought their books, participants in separate evaluation mode only showed greater happiness in the 80-buyer situations than in the no-buyer situation, but much weaker to no differences in the other situations that were quantitatively different.

Our replication
Source: Our replication
Our replication
Source: Our replication

Study 2 aimed to investigate whether the distinction bias can be applied to real experiences using a reading task. Participants in the Separate Evaluation group were asked to read and copy-paste a list of words, whereas those in the Joint Evaluation group were asked to predict the level of experienced happiness of participants in the Separate Evaluation group (see Table 2). Hsee and Zhang (2004) found that participants in the Joint Evaluation group overpredicted the perceived happiness among participants in the Separate Evaluation group in qualitatively different scenarios but not in quantitatively different scenarios. However, we failed to find support for the original findings in the replication study (see Fig.3).

Our replication
Source: Our replication
Our replication
Source: Our replication

Value of replications and evaluating replication findings

When the results of a study can be replicated, we increase our confidence in the reliability of the evidence. When our replication fails to find support for original findings, we can make adjustments to our understanding of the phenomenon to suggest that—at the very least—it is more complex and nuanced than we originally thought. Finding support for studies in a replication does not mean that their findings are always true or that the underlying theory always holds. Similarly, failing to find support for the findings in a replication does not mean that these findings are always false or that the underlying theory has been debunked.

Replications are a delicate and complicated process, and there could be many reasons why findings replicate or fail to replicate. In conducting replications, we replicators try the best we can to minimize deviations from the original procedures and address possible concerns, yet there are always differences. Time has changed, participants are different, and it is very likely that the setting and the context have changed.

Wing Yiu Hung
Wing Yiu Hung
Source: Wing Yiu Hung

Nonetheless, doing replications is a valuable learning experience and at the heart of the scientific process. Replicators gain a better understanding of science and the importance of open science in communicating findings openly and transparently.

For a long time, academia emphasized novelty, and researchers were incentivized to devote their research to finding something new. With increasing awareness of the challenges in reproducibility and replicability comes the realization of the need for credible, reliable research and the importance of adopting open science practices and also having scholars that are devoted to revisiting existing evidence and are dedicated to separate the signal from the noise (Nosek, 2019). By working together, promoting scientific transparency, and being open and collaborative in our assessment of our science, we can hopefully improve and do better, more credible science.


Feldman, G. and student collaborators (2019). Taking Stock of the credibility revolution: Scientific reform 2011-now. Retrieved from

Hsee, C. K., & Zhang, J. (2004). Distinction bias: Misprediction and mischoice due to joint evaluation. Journal of Personality and Social Psychology, 86(5), 680.

Nosek, B. (2019). Shifting incentives from getting it published to getting it right.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 346, aac4716.