In one of the most interesting short reports I read recently, some research was conducted in Australia examining what the effect of blind reviews would be on hiring. The premise of the research, far as I can surmise, was that a fear existed of conscious or unconscious bias against women and minority groups when it came to getting hired. This bias would naturally make it harder for those groups to find employment, ultimately yielding a less diverse workforce. In the interests of avoiding that bias, the research team compared what happened when candidates were assessed on either standard resumes or de-identified ones. The latter resumes were identical to the former, except they had group-relevant information (like gender and race) removed. If reviewers don’t have that information of race or gender available, then they couldn’t possibly assess the candidates on the basis of them, whether consciously or unconsciously. That seems straightforward enough. The aim was to compare the results from the blind assessments to those of the standard resumes. As it turned out, there were indeed hints of bias; relatively small in size sometimes, but present nonetheless. However, the bias did not go in the direction that had been feared.

Flickr/Ivana Vasilj
Shocking that the headline wasn’t “Blind review processes are biased”
Source: Flickr/Ivana Vasilj

Specifically, when the participants assessing the resumes had information about gender, they were about three percent more likely to select women, and three percent less likely to select men. Further, minorities were more likely to be selected as well when the information was available (about six percent for males and nine percent for females). While there’s more to the picture than that, the primary result seemed to be that, when given the option, these reviewers discriminated in favor of women and minority groups simply because of their group membership. If these results had run in the opposite direction (against women and minorities) there would have no doubt been calls for increasing blind reviews. However, because blind reviews seemed to disfavor women and minorities, the authors had a different suggestion:

Overall, the results indicate the need for caution when moving towards ’blind’ recruitment processes in the Australian Public Service, as de-identification may frustrate efforts aimed at promoting diversity

It’s hard to interpret that statement as anything other than ”we should hire more women and minorities, regardless of qualifications.” Even if sex and race ought to be irrelevant to the demands of the job and candidates should be assessed on their merit, people should also apparently be cautious when removing those irrelevant pieces from the application process. The authors seemed to favor discrimination based on sex or race so long as it benefited the right groups. Such discriminatory practices have led to negative reactions on the part of others, as one might expect.

This brings me another question: why should we value diversity when it comes to hiring decisions? To be clear, the diversity being sought is often strictly demographic in nature (many organizations tout diversity in race, for instance, but not in perspective. I don’t recall the draw of many positions being that you will meet a variety of people who hold fundamental disagreements with your view on the world). It’s also usually the kind of diversity that benefits women and minorities (I’ve never come across calls to get more white males into certain fields dominated by women or other races. Perhaps they exist; I just haven’t seen them). But are there real economic benefits to increasing diversity per se? Could it be the case that more diverse organizations just do better? On the face of it, I would assume the answer is “no” if the diversity in question is simply demographic in nature. What matters when it comes to job performance is not the color of one’s skin or what sex chromosomes they possess, but rather their skills and competencies they bring with them. While some of those skills and competencies might be very roughly approximated by race and gender if you have no additional information about your applicants, we thankfully don’t need to rely on those indirect measures. Rather than asking about gender or race, one could just ask directly about skill sets and interests. When you can do that, the additional value of knowing one’s group membership is likely close to nil. Why bother using a predictor of a variable when you can just use the variable itself?

Flickr/Andrew Skudder
Do you really love roundabouts that much?
Source: Flickr/Andrew Skudder

Nevertheless, it has apparently been reported before that demographic diversity predicts the relative success of companies (Herring, 2009). A business case was made for diversity, such that diverse companies were found to generally do better than less diverse ones across a number of different metrics. Not that those in favor of increasing diversity really seemed to need a financial justification, but having one certainly wouldn’t hurt their case. As this paper was apparently popular within the literature (for what I assume is that reason), a replication was attempted (Stojmenovska et al, 2017), beginning in a graduate course as an assignment to help students “learn from the best.” Since it seems “psychology research” and “replications” mix about as well as oil and water as of late, the results turned out a bit worse than hoped. The student wasn’t even trying to look for problems; they just stumbled upon them.  

In this instance, the replication attempt failed to find the published result, instead catching two primary mistakes made in the original paper (as opposed to anything malicious): there were a number of coding mistakes within the data, and the sample data itself was skewed. Without going too deeply into why this is a problem, it should suffice to say that coding mistakes are bad for all the obvious reasons. Fixing the coding mistakes by deleting missing data resulted in a substantial reduction in sample size (25-50 percent smaller). As for the issue of skew, having a skewed sample can result in an underestimation of the relationship between predictors and outcomes. In brief, there were confounding relationships between predictor variables and the outcomes that were not adequately controlled for in the original paper. To correct for the skew issue, a log transformation on the data was carried out, resulting in a dramatic increase in the relationship between particular variables.

In order to provide a concrete sense for that increase, in the original report the correlation between company size and racial diversity was .14; after the log transformation was carried out, that correlation increased to .41. This means that larger companies tended to be more racially diverse than smaller ones, but that relationship was not fully accounted for in the original paper examining how diversity impacted success. The same issue held for gender diversity and establishment size.

Once these two issues—coding errors and skewed data—were addressed, the new results showed that gender and racial diversity were effectively unrelated to company performance. The only remaining relationship was a small one between gender diversity and the logged number of customers. While seven of the original eight hypotheses were supported in the first paper, the replication attempt correcting these errors only found one of the eight to be statistically supported. As most of the effects no longer existed, and the one that did exist was small in size, the business justification for increasing racial and gender diversity failed to receive any real support.

Flickr/Björn Söderqvist
Very colorful, but they ultimately all taste the same
Source: Flickr/Björn Söderqvist

As I initially mentioned, I don’t see a very good reason to expect that a more demographically diverse group of employees should yield better outcomes. They don’t yield worse outcomes either. However, the study from Australia suggests that the benefits of diversity (or the lack thereof) are basically besides the point in many instances. That is, not only would I imagine this failure to replicate won’t have a substantial impact on many people’s views on whether or not diversity should be increased, but I don’t think it would even if diversity was found to be a bad thing, financially speaking. This is because I don’t suspect many views of whether increasing diversity should be done are based on the foundation that it’s good for people economically in the first place. Increasing diversity isn’t viewed as a tricky empirical matter as much as it seems to be a moral one; one in which certain groups of people are viewed as owing or deserving various things.

This is only looking at the outcomes of adding diversity, of course. The causes of such diverse levels of diversity across different walks of life is another beast entirely.

References: Stojmenovska, D., Bol, T., & Leopolda, T. (2017). Does diversity pay? Replication of Herring (2009). American Sociological Review, 82, 857-867. 

Herring, C. (2009). Does diversity pay? Race, gender, and the business case for diversity. American Sociological Review, 74, 208–224.

You are reading

Pop Psych

No Sexism In Scrabble

Using objective metrics of performance to understand sex differences

Imagine If The Results Went The Other Way

Science should be used to disprove perspectives for it to be useful

Diversity: A Follow-Up

Why did demographic diversity fail to improve business outcomes?