Thursday, March 22, 2012

Pain points in mturk

I posted a couple days ago on skimming and cherry-picking on mturk.  Today I want to add to my list of pain points.  These are things that I've consistently faced as I've looked at ways to integrate mturk into my workflow.  Amazon, if you want even more of my research budget, please do the following.  Competitors, if you can do these things, you can probably give Amazon a run for its money.



Here's my list:

1. Provide tools for training turkers.
Right now, HITs can only cover very simplistic tasks, because there's no good way to train turkers to do anything more complicated.  There should be a way for requesters to train (e.g. with a website or video) and evaluate turkers before presenting them with tasks.  It's not really fair to impose the up-front cost of training on either the turkers or requester alone, so maybe Amazon could allow requesters to pay turkers for training time, but hold the money in escrow until turkers successfully complete X number of HITs.

2. Make it easy to communicate with turkers.This suggestion goes hand-in-hand with the previous one.  Right now it's very difficult to communicate with turkers.  I understand that one of the attractions for the site is the low-maintainence relationship between requesters and turkers.  But sometimes it would be nice to clear that barrier, in order to clarify a task, give constructive feedback, or maybe even -- call me crazy -- say "thank you" to the people who help you get your work done.  It's possible now, but difficult.  (Turkers consistently complain about this lack as well.)

3. Make it easy to accept results based on comparisons.
Monitoring HIT quality is a pain, but it's absolutely necessary, because a handful of turkers do cheat consistently.  Some of them even have good acceptance ratings.  I often get one or two HITs with very bad responses at the beginning of a batch.  I suspect that these are cheaters testing to see if I'm going to accept their HITs without looking.  In that case, they'd have a green light to pour lots and lots of junk responses into the task with little risk to their ratings.

As long as it's easy to get away with this approach, cheaters can continue to thrive.  "Percentage of accepted tasks" is a useless metric when requesters don't actually screen tasks before accepting them.  What you want is the percentage of tasks that were screened AND accepted.  Some basic, built-in tools for assessing accuracy and reliability would make that possible, effectively purging the market of cheaters.

4. Provide a way for small batches to get more visibility.One of my main reasons for going to mturk is quick turnaround.  In my experience, getting results quickly depends on two things: price, and visibility.  Price is easy to control.  I have no complaints there.  But visibility depends largely on getting to the top of one of mturk's pages: especially most HITs or most recent.  If you have 5,000 HITs, your task ends up on the front page and it will attract a lot of workers.  But attracting attention to smaller tasks is harder.  Mturk should make a way to queue small batches and ensure that they get their fair share of views**.

5. Prevent skimming and cherry pickingI've written about this in my last post.  Suffice to say that mturk's system currently rewards turkers for skimming through batches of HITs to cherry pick the easy ones. This is not fair to other workers, wastes time overall, wreaks havok on most approaches for determining accuracy, and ruins the validity of some kinds of data. I can't blame turkers for being smart and strategic about the way they approach the site, but I can blame Amazon for making couterproductive behavior so easy.  Add a "Turkers can only accept HITs in the order they're presented" flag to each batch and the problem would be solved!


Looking back over this list, I realize that it's become a kind of freakonomics*** for crowdsourcing. There are a lot of subtle ways that a  crowdsouring market can fail, and devious people have discovered many of them.  In the case of mturk, it's a market in a bottle, so you'd think we could do some smart market design and make the whole system more useful and fair for everyone.




* Right now, one strategy is to dole out the HITs one at a time, so that each one will constantly be at the top of the "most recent" page. But this makes it hard for turkers to get in a groove.  It also requires infrastructure -- a server programmed to submit HITs one by one.  Most importantly, it essentially amounts to a spam strategy, with all requesters trying to attract attention by being loud and obnoxious.  You can't build an effective market around that approach.

** Sites like CrowdFlower are trying to address this need.  I haven't used them much -- relying more on homegrown solutions -- so maybe this is a concern that's already been addressed.

*** The original freakonomics, about evidence of cheating in various markets, before the authors turned it into a franchise and let popularity run ahead of their evidence.

No comments:

Post a Comment