4 Changes I Will Make When Using Amazon MTurk for Research
Some ideas on using MTurk to collect data more ethically and effectively
Posted July 31, 2015
A few days ago, I described my rather intense and not altogether pleasant experience working part-time as an Amazon Mechanical Turk (MTurk) worker. Not only did this experience leave me “shaken and stirred” so to speak, but it also made me think about changes I need to make when I go back to using MTurk as a researcher to collect survey or experimental data.
Here is my list of four changes I plan to make when using MTurk to post my work (fellow researchers are welcome to join J).
1. Pay workers twice as much!
The issue of pay was perhaps the most striking revelation from my experience and one that would be impossible without having walked in an MTurk worker’s shoes for a week. So far, when I posted my surveys on MTurk, I thought of workers in very abstract and fuzzy ways, if I thought of them at all.
Having worked in commercial marketing research before, there was a nagging sense in the back of my mind that I was getting my data for a bargain-basement price (I once worked on a research project involving prisoners, and even such a captive audience cost about $3/survey); but I brushed it off. At what cost? After my worker experience, it is clear to me that I was not paying enough for the value I am getting from MTurk workers.
For a 6-7 minute survey, I offered between $.40 and $.60 depending on number of open-ended questions, having to read and comprehend passages, etc. From now on, I am going to offer at least $1.00. With my limited research budget, this unilateral pay raise means that I will have to work harder on the front end in designing and planning studies more carefully and doing fewer of them.
(And I do realize that Amazon recently increased the commission they charge researchers substantially. But the way I see it, that’s an entirely different issue best left to more ferocious fighters; it’s got nothing to do with fair worker remuneration).
There you go, a marketing professor offering to pay double! Strange, I know.
2. Diligently control and maintain survey quality
From the standpoint of MTurk workers, anything that slows them down hurts, both cognitively (in how they think and respond to questions) and in the pocket book! Such pain points can include many things. Questions that are tongue-twisters, worded poorly or ambiguously, spelling mistakes, grammar mistakes, convoluted survey flows, tasks that make respondents sit for more than two minutes while a digital watch agonizingly runs down, open-ended questions that force people to write a certain humungous number of words (or characters; I feel that anything more than 100 characters is too much), surveys that are too long (anything more 70 total questions is too long in my view) all contribute greatly to survey-taker pain.
In the studies I (or my co-authors) have posted on MTurk, we have been generally good about these things. But we do falter from time to time. Sometimes, we miscode a response type in a question so that a blank that should really be filled with text forces a response for an email address. Other times, we leave a glaring typo in the survey when we post it that distracts respondents. It’s tiny things like this that we are going to focus on fixing.
My own most egregious quality-control oversight is that I will often skip copy-editing, pretesting and running through questionnaires I am about to post on MTurk (out of sheer laziness or unwarranted over-confidence, I don’t know which).
My mantra from now on: Copyedit and then pretest the questionnaire on myself (and co-authors or minions) at least three times before it is ever posted for MTurk workers. If I can’t work through it smoothly, chances are that respondents won’t either.
I made a checklist that walks through all these potential issues and may be useful to researchers conducting online surveys. It can be downloaded here.
3. Not just obtain informed consent but also provide opportunity for “informed feedback”
Like most researchers, I insert an informed consent form explaining my study’s procedure, its potential risks, my contact information, etc. at the very beginning of my questionnaire. But I now realize that this is not as helpful as it could be for a couple of reasons.
- The form is rather dense. It is unreasonable to expect MTurk workers on the clock to carefully read 300+ words even before the study begins. Even if I ask workers to print it out for future reference, it is unlikely that most do so.
- The informed consent form is very often the only place where the researcher’s name and contact information is provided. However the worker really needs this information at the very end after the final button is pressed and the survey is submitted.
So in addition to obtaining informed consent, I am going to obtain “informed feedback” at the very end after the survey has been submitted and the completion code given. This is where it seems appropriate to provide my contact information and ask for feedback and give an opportunity to voice any complaints.
Marketers routinely ask customers about their experience after a restaurant meal or an airplane ride, so why shouldn’t researchers do the same thing after an MTurk survey?
4. Supplement MTurk studies with drastically different samples and methods
Regardless of the statistical adequacy of MTurk data, there is something inherently stinky about social science research conducted solely with a method that has survey-takers filling out surveys like assembly line workers putting together Apple watches.
Let me tell you why. In moral decision making, an important (and horrific) class of problems involves choosing between two REALLY bad situations (which I sincerely hope none of us ever have to encounter in our real lives). Such dilemmas include: “Would you smother one child with your bare hands to save a dozen strangers from being killed?” or “Would you cook and serve up your pet dog/cat in a village feast if it meant that your whole village would be spared from being slaughtered forthwith?” Yikes!
These types of questions are ghastly. For any normal human being, they create emotional and moral turmoil when you encounter them (at least they did for me!). That is what they are meant to do. But suppose you encounter the same questions again and again… and again; let’s say you encounter a dozen of these questions in 20 surveys each week (moral decision making is a popular research topic, after all).
Are you going to feel the same intense moral quandary the 17th time you answer such questions? No way! You will go like: “Yawn! Oh well, to appease the marauders, I will cook up Rover into steaks for the village feast! Next!” My point is when responding, you might still answer the questions the way the researcher wants, expects, or predicts, but surely without any emotional turmoil that lies at the heart of such moral decisions!
There are at least two other reasons that I am wary about conducting solely MTurk sample-based research. First, because of the ease, cost, and speed of running MTurk studies, there is a growing tendency to run numerous studies for the same project with the idea of replicating the finding and using fancy statistical methods like small-scale meta-analysis to figure out the effect’s size more reliably.
However, this deluge of cheaply acquired data overlooks the fact that all the studies have been done with survey-takers who are homogeneously jaded; so the results may not generalize beyond assembly-line survey takers. And the truth of the matter is that we know far too little about professional survey takers beyond the fact that they are relatively few in number.
Second, for applied researchers in particular, the complacency arising from having conducted numerous online studies shifts the focus away from acknowledging the severe limitations associated with text-based tasks that involve imaginary stories and scenarios that online surveys are able to afford. Such measurements may or may not transfer to real decisions or behaviors in the real world that we are really interested in studying.
For these reasons, I plan to use MTurk samples mainly for conducting pre-tests and for exploratory research that constitutes a small fraction of studies in any particular research project. As it is, I prefer to conduct field studies (with real customers, for example) and measure real behaviors rather than opinions about how survey-takers would behave.
I want to say one final thing (if you, the reader, got this far without storming off cursing me soundly). When posting work on MTurk, wouldn’t it be nice if researchers used their real name and stated their affiliation clearly? Using pseudonyms like “SocPerfLab37” creates the impression of online spammers who pitch Viagra every day!
A collective of MTurk workers and researchers called the Dynamo have put together an excellent set of guidelines for researchers that describes many of the things I have touched on here in a lot more depth plus much more. The group has also done some interesting research on collective action among MTurk workers.
More About Me
You can find and download a lot of my academic and some of my practitioner-oriented writing at SSRN. If you can’t find something old I have written, shoot me an email and I will send it to you.
Some of my writing for managers and business people can be found at HBR.org and I also write a relatively new blog called “The Science behind Behavior” on Psychology Today.
You can connect with me on LinkedIn or Facebook, or you can send me an email. All questions, comments, thoughts, and ideas for future blog pieces or academic research projects are welcome. Just don’t send me any hate mail, please.
[This post is cross-posted from LinkedIn Pulse]