Play it again, Sam.

~Bogart in Casablanca, misquoted

In the hand and pen wringing over the state of psychological science (see the November 2012 issue of Perspectives on Psychological Science), the consensus is that too many published studies report anticipated rejections of null hypotheses(see here and there for allergic reactions). The glut of non-null results raises the specter of false positives, which many of the issue’s contributors are out to slay. Many call for replication studies to separate the true from the false positives. How a culture can be achieved in which replication studies are valued remains to be seen. Koole & Lakens make some interesting recommendations, which I do not dispute here. Instead, I explore their diagnosis of why replications are currently undervalued.

Koole & Lakens (2012, p. 610) write that “the current incentive structure within psychology gives rise to a social dilemma in which researchers’ collective interests run in the opposite direction as the interests of individual researchers. The research community collectively benefits if researchers take the trouble to replicate each other’s work, because this improves the quality and reputation of the field by showing which observations are reliable. Yet, individually, researchers are better off by conducting only original research, because this will typically yield more publications and citations and thus ultimately greater rewards in terms of better jobs and more grant money.”

Doing and publishing novel work is thus construed as an act of defection, whereas doing a replication is seen as an act of cooperation. The individual researcher would rather defect, whereas the community wants him to cooperate. Stated this way, the interests of individual and community are diametrically opposed. How bad is it really? Here I come to the part of messing with other people’s metaphors.

Social dilemmas come in various forms, or games. The game that seems to fit the research dilemma is the prisoner’s dilemma. As may be familiar from other posts (such as this one) or books (such as Poundstone, 1993), 4 different payoffs await the player, and the value of each depends on the intersection between his own choice and the choices of others. The individual player ranks his payoffs as T (unilateral defection) > R (mutual cooperation) > P (mutual defection) > S (unilateral defection). The ranking of the collective (or summed) payoffs is 2R > T+S > 2P. If we assign numerical values reflecting the ranks and compute correlations, we see that the conflict of interest between a player and the collective (of which he is a part) is less pronounced (.32) than the conflict between two players (-.8). The positive correlation between individual and collective is driven by the fact that mutual cooperation is more beneficial to both the individual and the collective than is mutual defection. According to the standard exegesis (not my own, though), this correlation is not enough to motivate the individual to cooperate because, after all, defection pays more regardless of what the other does.

But the research dilemma cannot be a prisoner’s dilemma. If every researcher cooperated by seeking to perform replication studies, what studies would there be to replicate? The scientific community needs defectors, or else the cooperative act of doing a replication is not possible.

Can the research dilemma be an assurance game, where R > T > P > S? In this game, individual players try to match the strategy of others. They want to cooperate if others cooperate and defect when others defect, which means there are two Nash equilibria, mutual cooperation and mutual defection. Once locked into one of these, no player has an incentive to change strategy. Neither equilibrium is desirable. The research community does not benefit from producing only novel findings, as Koole and Lakens note, and a literature containing only replications of old findings is a dead literature (as in the case of the prisoner’s dilemma).

How about the game of chicken, where T > R > S > P? This game also has two Nash equilibria, but these are found where players do the opposite of what others do. A defector (novelty-seeking researcher) gains inasmuch as there are many cooperators (replication seekers) and vice versa. Neither strategy dominates. The game has a certain appeal for the business of science. Novelty seekers get either the highest or the lowest payoff depending on whether they are in the minority or the majority. Replication seekers get intermediate payoffs, and earn the relatively higher payoff if they are in the majority.

The volunteer’s dilemma, where T > R = S > P, is similar to the game of chicken, except that it does not matter to the replication seekers if they are in the majority. Rational players volunteer (cooperate, seek replications) with a probability that can be derived from the payoffs, thus ensuring that there is a healthy (if not optimal) mix of novel studies and replication studies.

In short, the game of science is not a prisoner’s dilemma or an assurance game waiting for a solution; no one wants a situation in which doing replication studies is the only smart thing to do. The game of chicken and the volunteer’s dilemma do not fit descriptively because so few replications are currently being attempted. Perhaps with additional incentives for replications, the payoffs in the game of science will align with these games until the rationality of a mixed strategy prevails.

Meanwhile, a more heavy-handed solution awaits discussion. What if every manuscript, in order to be published in a coveted journal, would have to include an attempted replication of some other finding, and if the outcome of this attempt were not considered in the decision to publish the whole packet? As a policy, this approach would shortcut strategic choices by players who may not even know which game they are playing. Players, ah scientists, would just need to obey the replication rule in order to succeed.

This scheme is not only heavy-handed, it also remains blind to the problem of false negatives (Fiedler, Kutzner, & Krueger, 2012). How will the field learn about true effects when initial attempts to detect them fail (false non-rejections of the null hypothesis). Additional incentives would be needed to publish original work that is not statistical significant. If this challenge is not understood and addressed, the game of science will continue to be played with a stacked deck. If our epistemology is deficient, can we at least have some fun?

We (Fiedler et al.) recommended that theory reclaim its role as that which organizes and integrates knowledge and that which tells us where to look for new discoveries. In this era of incentivized empiricism, this view seems quaint and old-fashioned. But sometimes, old ideas deserve another look. Physics has its own respected branch of theoretical physics, and so does economics. Why not psychology? Doing theory for theory’s sake is an empty exercise, but as a way of framing the world and one' s work, it is invaluable.

Fiedler, K., Kutzner, F., & Krueger, J. I. (2012). The long way from α-control to validity proper: Problems with a short-sighted false-positive debate. Perspectives on Psychological Science, 7, 661-669.

Koole, S. L., & Lakens, D. (2012). Rewarding replications: A sure and simple way to improve psychological science. Perspectives on Psychological Science, 7, 608-614.

Poundstone, W. (1993). Prisoner’s dilemma. New York, NY: Doubleday.

You are reading

One Among Many

The Art of War, Theban Style

Epaminondas took the Spartans by surprise. Surprise!

How Not to Believe

Nelson Mandela deserves better, and so does Paul Feyerabend.

Why Teach More?

Taking on extra teaching seems like an irrational choice, unless. . .