For each election, we asked for as many candidate names, parties, and campaign websites as turkers could find. We also asked for websites to verify the information, with the stern warning, "To receive credit for your work, you must include stable URLs to credible sites where you found your information."
Here's a screenshot of the mturk task:
Some details:
We vetted the data by running each task twice and comparing responses. Wherever we found discrepancies (there weren't very many), we fixed mispellings, checked to make sure candidates were real, etc. Overall, it worked pretty well. We found nearly 2,000 candidates across the almost-500 elections. I'm guessing the final data contain a handful of mistakes, but not many. It is, as they say, good enough for government work.
Here's the data format. The main file is an array of election objects:
election :
id : a unique ID for the election
office : "president" / "senate" / "house" / "governor"
state : The name of the state where the election is being held. N/A for president.
district : The congressional district where the election is being held. N/A for everything except house races.
candidates : an array of candidate objects.
candidate :
name : the candidate name
party : The candidate's party (Republican / Democrat / Other)
websites : an array of website URLs.
I'm putting this out in case anybody else was searching for the same kind of data. I'd love the hear about any mistakes and/or useful applications for it. Cheers!
No comments:
Post a Comment