Monday, March 19, 2012

Market failure in mechanical turk: Skimming and cherry-picking

This is the first in a series of three posts -- a trilogy! -- about pain points on Amazon's mechanical turk, from a requester's perspective.



I'm a frequent user of mturk.  I like the service, and spend a large fraction of my research budget there.  That means I also feel its limitations pretty acutely. Today I want to write about a problem that I've noticed on mturk: skimming and cherry picking.  (A few weeks ago, I complained about ubuntu.  Why is it that we only hurt the computing systems we love?)

Here's the problem: even within a batch, not all HITs are equally difficult. I've discovered that some workers (smart ones) will skim quickly through a batch and cherrypick the easy HITs. For instance, given a list of blog posts to read and evaluate, some turkers will skip the long ones and only code the short ones.

Individually, skimming makes perfect sense.  If you do, you can certainly make more dollars per hour.  As a bonus, you might even get a higher acceptance rate on your HITs, because short HITs lend themselves to unambiguous evaluation.  The system rewards strategic skimming.

But from a social perspective, skimming is counterproductive.  It wastes time overall, because time spent skimming is time not spent completing tasks*. It's not really fair to other workers. It wreaks havoc on many approaches for determining accuracy. (As a requester, I've experienced this personally.) From a scientific standpoint, it can also ruin the validity of some kinds of data collection.

I first ran into clear evidence of skimming over a year ago.  At first, I didn't want to say anything about it, because I didn't want to give anyone ideas.  At this point, I see it all the time.  One easy-to-observe bit of evidence: the hourly rate on most HITs will start high, and fall over time**.  This is because skimmers grab the quick, easy tasks first, leaving slower tasks for later workers.

I can't really blame turkers for approaching their work in a clever way.  Instead, I lay the blame on Amazon, for making counterproductive behavior so easy.

It's especially galling because it would be very easy to fix the problem. On the HIT design page, they should add a "Turkers can only accept HITs in the order they're presented" flag to each batch.  For tasks with this flag checked, turkers would be be shown one HIT at a time.  They'd be unable to view or accept others in the batch until they'd completed the HIT in front of them***. This would effectively deny turkers control over which HITs they choose to do within a batch****.  It would end the party for skimmers, but make the market more efficient overall.  A simple tweak to the market -- problem solved.

How about it Amazon?


* You can think about the social deadweight loss from skimming like this:
Let T be the total time all workers spend completing HITs.  Skimming doesn't change T -- the total amount of task work is constant. But skimming itself is time consuming. Let S be the deadweight loss due to skimming on a given batch.  Like T, the total wage for a given batch is also constant.  Call it W.

In aggregate, the effective hourly wage for the whole batch without skimming is W/T.  With any amount of skimming it is always less:  W/(T+S). So although skimming may improve the hourly wage of the most aggressive cherry pickers, on the whole it always hurts the hourly wage of the mturk market as a whole.

** Yes, yes -- I know that this is not an acid test: there are other explanations for hourly rates that decline over the life of a task.  Still, it's good corroborating evidence for an explanation that makes a lot of sense to begin with.

*** Only viewing one HIT at a time might make it harder for turkers to get a sense of what a given batch is like. There's a simple fix for this as well: allow turkers to see the next k tasks, where k is a small number chosen by the requester. This might make it harder to build a RESTful interface for turkers, though. I haven't thought it through in detail.

**** It's possible that requesters would abuse this power by doing a bait-and-switch: showing easy HITs first and then making them more difficult once workers have invested in learning the task. This seems like a minor concern---if the tasks get tough or boring, turkers can always vote with their feet. But if we're worried about it, there's an easy fix here as well: take control of the HIT sequence away from requesters, just like we took it away from workers. It would be very easy to randomize the order of tasks when the "no skimming" box is checked.  Or allow requesters to click a separate "randomize tasks" box, with Amazon acting as credible intermediary for the transaction.

2 comments:

  1. your current HITS pay about 2.50 an hour for a fast turker. You want to drop our wages per hour below third world country income standards? I don't expect to make 10 bucks an hour on Mturk, but i would expect an educated and active Requester to have such a blatant post on how Mturk can LOWER our wages? Go pay min wage for your research and stop screwing us and trying to direct Mturk to packing vaseline on our wounds to ease the entry of further intrusions.

    ReplyDelete
  2. I don't know where you're getting your numbers, friend. I just chalked up totals and averages for the first batch, and the average turker is making $3.93/hr before bonuses and $12.15 afterwards. These rates won't buy a yacht, but they are *very* competitive for mturk.

    The turkers doing the best are fast and accurate. These guys and gals are doing very good work on quick deadlines, and I respect them for it. I'm no millionaire either---finishing this project on nights and weekends on a grad student budget---but I try very hard to pay good rates for high-quality work.

    If you have specific questions or problems with my HITs, please get in touch and I'll try to fix them. If you don't like what I've written about cherry-picking, we'll have to agree to disagree. I see it as a fairness issue: if I skip a slow HIT, someone else is going to get stuck with it. I can't fix everything about mturk, but I'm trying to be fair and open about the work I do there.

    ReplyDelete