Labor by txmx-2/ Flickr licensed under CC BY 2.0
Source: Labor by txmx-2/ Flickr licensed under CC BY 2.0

Amazon Mechanical Turk (MTurk) is currently the most popular crowdsourced marketplace for work. It has rapidly transformed how social science research is conducted. Over the past three or four years, MTurk has been enthusiastically adopted across the social sciences as a viable mainstream alternative to using students or other respondents to conduct experimental and survey-based research.

For social scientists, collecting data with MTurk has three compelling advantages:

  • Cheap data: With an MTurk sample, data collection is cheap! Compared to other online panels, respondent costs can be as much as 80-90% lower, so data collection budgets stretch much further.
  • Fast data: Data collection is fast. It is entirely possible to gather a few hundred responses within an hour or two. Researchers love the speed because they can test their ideas quickly.
  • Quality data: The quality of data is acceptable. Since 2010, a number of studies have validated data quality of MTurk samples, finding it to be sufficiently diverse (although not representative), and comparable to other more expensive online data sources.

As a researcher I love using MTurk and have conducted more than 50 studies on it so far. Over the past week, however, I changed roles. Instead of posting a study as I usually do, I worked part-time as an MTurk worker. I spent approximately twenty five hours and took approximately 300 surveys on the site, answering everything from questions about witness testimony to watching advertisement videos and evaluating them, and solving creative problems.

I undertook this exercise for two selfish reasons. First, I wanted to experience life on the “other side of the fence” so to speak, to feel what an MTurk worker’s experience is like so I could be more empathetic in conducting my future studies on MTurk. Second, I wanted to understand how other researchers are using the site, to adopt some of their best practices in designing surveys, recruiting respondents, managing respondents, and so on in my own research.

My experience as an MTurk worker was intense, full of surprises, not all of which were pleasant. Here’s what I learnt from my experience as an MTurk worker:

1. MTurk workers do not earn much money and the site made it impossible to figure out how much I earned per hour of work I performed.

As an MTurk worker, I worked hard but I earned little money. If my experience is any indication, the wages of a typical, diligent MTurk worker are nowhere close to the minimum stipulated wages in the United States. Over my one week as an MTurk worker, I estimate that I earned somewhere between $3 and $3.25 per hour of work (and that includes a couple of unexpected bonuses).

 Even more bothersome was the fact the site did not provide me with my “per-hour” earnings information anywhere so there was no way for me to know precisely how much I made per hour. As a researcher (or “requester” in MTurk parlance), the site provides me with an “Effective Hourly Rate” value that I pay my workers. So why can’t a worker have this information? It would be nice to know precisely how many hours I worked and how much money I made in my MTurk dashboard.

 As an ethical researcher, this part of my experience struck me the hardest. Now I cannot help but feel guilty and even embarrassed about how much (or actually, how little) money I have been paying participants of my studies. While it is true that I want to collect data as cheaply and efficiently as possible (I have a limited research budget, after all), I do not want to exploit people who work for me. MTurk is raising the commission they charge researchers on July 22, and I wonder if any of these spoils will be shared with its workers.

2. Not all researchers posting surveys on MTurk are ethical. On MTurk, unethical actions play out in a number of different ways, some small, and others exploitative.

I was surprised by the degree and variety of what I, as an MTurk worker, perceived to be unethical researcher behavior. In answering surveys, I was made to work much longer than promised, refused payment for work I had already done without sufficient explanation or recourse, and I was not provided with an appropriate method to protest or to receive redress. Here are some things I experienced:

Many questionnaires took WAY longer to complete than the researcher had promised. For example, one survey informed me that it would take no more than five minutes and offered me $.25, but then the researcher forced me to watch a video that went on for 11 minutes (and was still going on, at which point I abandoned the study).

In one instance, my work was rejected because I did not answer one question the researcher had inserted somewhere within a 100+ question survey to measure my attention correctly. I was not informed that such a thing could happen in the beginning of the survey. There was nothing I could do about this. Answering one wrong question negated the dozens of right answers I had provided. How could I not help feeling exploited?

In another instance, I was supposed to receive a completion code at the end of the study, but there was no completion code, the survey just ended abruptly. Needless to say, I did not earn even the twenty-five cents I had been promised.

In a number of questionnaires I completed over the week, there was no initial informed consent form, and in other cases, even when such a form was provided, no specific researcher was clearly identified. I had no one to ask questions or complain to. This is particularly problematic when questions are about complex issues like moral decisions (“Would you kill one person you know to save a dozen strangers?”)

Unethical behaviors have many causes, and often arise simply from the ignorance or incompetence of individuals rather than willful subversion. Regardless of motivations, I feel that the site itself makes it easy for researchers to transgress ethical norms. As one example, many researchers posting their surveys use pseudonyms instead of real names, a recipe for encouraging questionable behaviors. As another example, there is no way for a worker to revisit his or her work and see if they actually made the mistake the researcher is accusing them of. Why not force researchers to use their real names and affiliations when posting work on the site?

3. Many studies are poorly designed and slow workers down.

There was a surprising amount of variance in the quality of the questionnaires I answered. While a majority of questionnaires were exemplary, I came across surveys from academic researchers belonging to reputable institutions that were so ridden with grammatical errors and spelling mistakes that it was impossible to understand what I was being asked about. At other times, the flow of questions made little sense and stumped me completely. Dealing with such problems took more time and reduced my earnings rate even further. I feel that quality control is sorely needed among researchers if this site is to remain a viable platform that attracts diligent and thoughtful respondents. For my part, I have been conscientious about quality control in designing my own surveys and intend to be even more watchful in the future.

4. Answering surveys consecutively produces biased responses.

In a single afternoon, when I completed about fifteen surveys one after the other over a period of several hours, I came across the following paragraph in four different studies:

“We go to the city often. Our anticipation fills us as we see the skyscrapers come into view. We allow ourselves to explore every corner, never letting an attraction escape us. Our voice fills the air and street. We see all the sights, we window shop, and everywhere we go we see our reflection looking back at us in the glass of a hundred windows. At nightfall we linger, our time in the city almost over. When finally we must leave, we do so knowing that we will soon return. The city belongs to us.”

Each time, I was instructed by the researcher to count the number of pronouns (“Our”, “we”, “ourselves”) in the paragraph. This paragraph is part of a “pronoun circling task” that is commonly used to prime a concept that psychologists call relational collectivism (a different version of this paragraph uses “I”, “me”, etc. instead to prime an individualism concept, but strangely, I was assigned to the collectivism paragraph in all four studies!). By the time I came across this paragraph the third time that afternoon, I already knew the answer to the question of “how many pronouns?” So I could easily skip reading it. Needless to say, the priming task did not work on me by then.

Over my one week completing MTurk surveys, I filled out scales measuring psychological traits like impulsive buying, moral identity, collectivism, and subjective well-being over and over again. In some cases, the scales were measuring my personality after an experimental manipulation but in other cases, the same scale WAS the manipulation.

This led me to wonder: Do psychological researchers account for the possibility that their MTurk participants may have just completed a similar study, or even been primed with a contrasting concept recently when they analyze their MTurk data or draw inferences from it? Are their conclusions still valid? These and a host of related questions need to be answered carefully by researchers to determine the scope and limits of collecting data on MTurk from respondents who complete hundreds of surveys a week.

5. The site is heavily loaded in favor of requesters and against workers.

If I had to summarize my one-week experience as an MTurk worker in one phrase, it would be a “sustained sense of powerlessness.” After I had completed a study, there was no way for me to go back and identify who the researcher was that I had worked for. If a researcher rejected my work and did not pay me, the only thing I could do is send them an email through MTurk. I had no formal way of rating or sharing what I perceived to be the injustice of a particular researcher. There was no way for me to complain to the MTurk customer service folks (if such folks even exist) about the potential problems with a survey or researcher.

All of these issues have to do with the way the site is designed and structured. It seems to me that if these issues are not dealt with properly and quickly, the site is eventually going to resemble a child labor operation in its ethos more than a crowdsourced market for labor. There needs to be more balance in power between the employers and workers than currently exists on the site. While these issues are beyond my control, as a researcher, my main takeaway is that I need to be a lot more empathetic to the workers that I employ on the site.

So what lessons did I learn as a researcher from my experience?

My biggest lesson from my experience as an MTurk worker is that I need to pay more for my studies than I have been doing. There is no question in my mind that paying $.40 or $.50 for a survey is too little. I personally believe in fair wages for fair work, so I will significantly increase how much I offer the people who participate in my study.

I will also be more careful about what sort of studies I conduct on MTurk. Until my experience as a worker, the issue of potential biasing effects of completing a dozen studies one after the other on the quality of responses was not even on my radar screen. For me personally, MTurk data will be appropriate for initial pretests and exploratory research; but it is clear that I need less “professional” respondents who have not been primed with the same concept multiple times in one session to conduct rigorous and valid tests of my research hypotheses!

All in all, my experience as an MTurk worker was a revelation that has made me look at using MTurk as a researcher in an entirely new light.

[Cross-posted from: LinkedIn Pulse]

You might be interested in my follow-up blog post regarding the changes I made after this experience:  Four changes I will make when using Amazon MTurk for research

You might also enjoy my other blog posts.

You are reading

The Science Behind Behavior

Why Posted Prices Are Often Meaningless

There’s a large gap between asked-for prices and what customers actually pay.

What is Undisciplined Spending?

It’s a repeated pattern of three potentially harmful spending behaviors.

Why Eating Avocado Toast Can Set You Back Financially

It’s not eating avo-toast that matters so much as what the choice signifies.