The Paradox of A/B Testing

How do we test our nudges when people seem wary of experiments?

Posted Feb 13, 2020

Todd Lapin/flickr
Source: Todd Lapin/flickr

At most auto dealerships, salespeople who meet a quota earn a cash bonus. But given what we know about loss aversion and the endowment effect, a group of researchers wondered: Would people sell more cars if they were given a bonus upfront, but with the risk that they must repay that money if they fall short of expectations? The answer, rather unequivocally, is no. In an experiment including nearly 300 auto dealerships, such loss-framed incentives reduced sales by 5%, resulting in $45 million lost over four months.

This post, however, is not about selling cars—it’s about experimentation. In response to these unexpected results, Maritz Chief Behavioral Officer and study co-author Charlotte Blank tweeted:

@CharlotteBlank/Twitter
Source: @CharlotteBlank/Twitter

“Don’t just do it; test it.” Sage advice, especially considering what could have happened if loss-framed incentives were implemented across the board. And I completely agree with Blank: Well-designed experiments, even when the underlying science seems perfectly sound, remain our surest way to know whether and how nudges affect people’s behavior.

But experimentation bears a cost. Implementing nudges in the real world requires that people find those nudges acceptable. Even the most helpful nudge is of no use if met with opposition. Yet people often approach experiments with trepidation. Why are people wary of experiments, and how can we communicate the value of research that intervenes in people's day-to-day lives?

The A/B Illusion

A team led by Michelle N. Meyer tested several scenarios in which people were presented with two hypothetical policies, such as two different ways a company could encourage new employees to sign up for retirement benefits. When asked about these solutions in isolation, about 20% of people found them unacceptable. Nearly 40% of people disapproved, however, when told that the company would conduct an experiment to figure out which policy worked best. Meyer and colleagues called this the “A/B illusion”—referring to A/B testing, or the testing of an A option and a B option—and replicated this finding across multiple domains, from hospital safety procedures to teacher incentives.

Why would more people disapprove of a policy presented as an arm of an experiment compared to a lone, untested policy that impacts everyone? One potential reason for the A/B illusion is that people might not accurately judge how they feel about an option until it is presented next to another option. For example, over 600 Swedes were surveyed about policies requiring people who don't want to participate in a program to opt out (i.e., automatic enrollment), such as well-known opt-out programs for organ donation and retirement savings. People were more accepting of automatic enrollment when it was presented in isolation, as opposed to when it was presented as an alternative to mandating that everyone takes part. The suggestion of two plausible solutions to a problem, even when one is highly restrictive and almost universally disliked, appears to erode support for both.

The Knowledge Illusion

Another reason experiments raise objections is that they undermine confidence in the policymaker. People have faith that experts know what works, and running an experiment shatters this "knowledge illusion." While nothing is more prudent to scientists than testing our nudges, non-scientists can interpret our experiments as evidence that we really have no idea what’s going on. That’s not a good feeling.

So can we just tell people why we believe each idea will work, and that we’re trying to figure out which works best? That line of persuasion is tough. In a study on policies to reduce consumption of tobacco, alcohol, and unhealthy foods, directly telling people that a policy would work had a significant—but very small—impact on how acceptable people rated that policy. Even quantifying exactly how effective the policy would be made no difference. In general, how acceptable people consider a nudge to be is based on their pre-existing belief about how effective it will be in helping people or solving a societal issue. But more often than not, those pre-existing beliefs are completely wrong.

Why Do We Even Care about Acceptability?

Organizations are more excited than ever to craft new policies based on scientific evidence—which is awesome—but they can be wary of testing said policies scientifically. Companies such as Facebook, which has used its platform to study emotional contagion and influence voter turnout, have come under fire for their A/B testing. Articles use words like “manipulated,” “creepy,” and “guinea pigs,” clearly articulating the authors’ feelings about those experiments. Although these experiments certainly raised ethical questions about informed consent and appropriate levels of risk, would we have heard the same outcry had Facebook simply changed their whole platform without testing how it would affect users? Probably not. Being part of an experiment might raise people's eyebrows, while those same people probably delete emails titled "Updates to our User Agreement" or "Changes to our Privacy Policy."

To return to Blank’s point, we should experimentally (and transparently) determine which practices and policies work best when it comes to selling cars, getting students through college, or improving health care. Subjecting people to untested nudges, even with a strong theoretical rationale, is often less ethical than conducting an experiment. Thus, conducting experiments—while maintaining public acceptability—is an essential, undervalued, and really tricky part of nudge design. Is there a way to help people see past the A/B illusion?

Given how relatively new this research is, we don’t yet know much about how to frame experiments to increase acceptability. We could leverage social proof, highlighting times when experiments saved the day (like preventing auto dealers from losing millions in sales). But studies like that run the risk of associating experiments with negative outcomes, inadvertently convincing people that we’re better off with the status quo. We can emphasize the scope of a problem when the status quo is clearly ineffective, but that might motivate people to skip experimentation altogether and jump right to widespread implementation. The A/B illusion will certainly be difficult to untangle, but must be carefully considered as we continue the necessary work of rigorously testing our nudges in the real world.

References

Hagman, W., Erlandsson, A., Dickert, S., Tinghög, G., & Västfjäll, D. (2019). The effect of paternalistic alternatives on attitudes toward default nudges. Behavioural Public Policy, 1-24.

Meyer, M. N., Heck, P. R., Holtzman, G. S., Anderson, S. M., Cai, W., Watts, D. J., & Chabris, C. F. (2019). Objecting to experiments that compare two unobjectionable policies or treatments. Proceedings of the National Academy of Sciences, 116(22), 10723-10728.

Reynolds, J. P., Archer, S., Pilling, M., Kenny, M., Hollands, G. J., & Marteau, T. M. (2019). Public acceptability of nudging and taxing to reduce consumption of alcohol, tobacco, and food: A population-based survey experiment. Social Science & Medicine, 236, 1-10.