Monday, January 30, 2012

How to ride, eat, tame, etc. your personal elephant

This is a talk I gave at the annual "Hill Street TED" activity that my local congregation puts together.  These are short talks in TED format, put together by members of the congregation to share aspects of their life and work that don't get talked about much at church.

Here, I've taken my slides from the talk, added a script, and revised the format to better fit the web.  After going back and forth, I left in the Mormon references, even though I know some of them will be lost in translation. I also added a few slides based on comments and feedback from people at the talk. This let me put in more details and one-off ideas that just didn't fit into 10 minutes.  Enjoy!

EDIT: Slideshare is giving me grief, so here are pdf versions of the talk with speaking notes, and without. 
View more documents from Abe Gong.

Saturday, January 28, 2012

Announcing QuoteWars2012!

Just in time for the Florida primary, my brother and I have released a site about the 2012 elections: QuoteWars2012.

The site lets you quiz yourself on quotes by Obama, Romney, Gingrich, and other presidential hopefuls.  Think of it as a gamified survey: like a survey, it collects data about public opinion, but it's also designed to be fun and informative.

The site in public beta -- fun and usable, with a few minor bugs.  We're looking for feedback on how to expand and improve the site.  Please play the game, forward far and wide, and let us know what you think!

Tuesday, January 17, 2012

Don't use netlogo

A follow-up to yesterday's post on picking programming languages: conditions under which you should program in NetLogo.

The short answer: None.  It amazes me that people will put up with the pain of developing in a language where turtles are one of the primary object primitives.

A slightly longer answer: I can see why some people use NetLogo as a way to learn the basics of agent-based modeling, but I'll never be able to take it seriously as a research tool.

NetLogo is a legacy system with a lot of nifty-looking examples and modules, plus it can run in a browser.  These are its strengths.  But it's built on the Logo programming language, which is hopelessly outdated, and never intended for real number-crunching anyway.  In other words, using NetLogo signals that you don't know how to do real programming.

Monday, January 16, 2012

A presentation on practical approaches to picking programming packages

Last week, I helped kickstart the semester for the Complex Systems grad student group with a presentation on "principles and practices for picking powerful programming platforms, packages, and plugins."

It was a fun talk to give, because 1) it's something I've done many times, 2) it's something that a lot of the other students (and faculty) are affected by, and 3) most people haven't thought about it in great detail. Low-hanging fruit.

I probably came across as a bit of a pythonista. This was partly deliberate: most of the students in Complex Systems learn Java, because that's what's taught in the intro classes, and never get much exposure to other options. I wanted people to know they don't have to spend the rest of their lives typing extraneous semicolons. Or worse -- developing in turtles.

Plus, I am actually, in reality, a bit of a pythonista, so it came naturally.

Here are the slides. They sparked a good discussion. What would you add?

Friday, January 13, 2012

For future reference: upgrading to R 2.14 on Ubuntu 11.10

Today, I needed to upgrade from R 2.13 to R 2.14 on ubuntu 11.10, because I wanted to install the "tm" package.

It's easy once you know how -- you just need to get your debian sources.list pointed to the right place. For future ref, here are my notes on how this feat is accomplished.

Instructions for upgrading

List of CRAN mirrors

My favorite mirror:

The line I added to /etc/apt/sources.list:
deb oneiric/

From there, it's just a simple
   sudo apt-get update
   sudo apt-get install r-base
   sudo apt-get install r-base-dev

Tuesday, January 10, 2012

IFTTT confusion: mea culpa, but not really.

I've been playing around with IFTTT, trying to get this blog, my G+, and facebook accounts to sync.  Apologies for missed posts and weirdness.  (If you don't get this post, please let me know, ha ha.)

For the most part, I blame google and its lack of a clean API.  I would use your circles more if they worked well!

More about political microtargeting as data science

Two interesting articles about how campaiging is being influenced by data science.

From Slate: Anatomy of a narrow victory: Romney's Iowa win took a lot more than money

And from BusinessWeek (HT Dad): OBAMA Campaign’s Secret Weapon: Geeks.

(Photo from BW.  Love the github ref.)

A few observations on data versus money, data-driven decision making, and whether we really want more data in politics...

Monday, January 9, 2012

This it a test post for IFTTT

Testing, testing...

Data set: pretty much every candidate in any of the US primary elections (house, senate, governor, president)

Working on a project on social media and U.S. primary elections, I couldn't find a good, machine-readable listing of candidates.  (Project VoteSmart doesn't do primaries.)  So I crowd-sourced it on mturk.  Here's the data, in json format (documented below).

For each election, we asked for as many candidate names, parties, and campaign websites as turkers could find.  We also asked for websites to verify the information, with the stern warning, "To receive credit for your work, you must include stable URLs to credible sites where you found your information."

Here's a screenshot of the mturk task:

Some details:

We vetted the data by running each task twice and comparing responses.  Wherever we found discrepancies (there weren't very many), we fixed mispellings, checked to make sure candidates were real, etc.  Overall, it worked pretty well.  We found nearly 2,000 candidates across the almost-500 elections.  I'm guessing the final data contain a handful of mistakes, but not many.  It is, as they say, good enough for government work.

Here's the data format.  The main file is an array of election objects:

election :
    id : a unique ID for the election
    office : "president" / "senate" / "house" / "governor"

    state : The name of the state where the election is being held.  N/A for president.
    district : The congressional district where the election is being held.  N/A for everything except house races.
    candidates : an array of candidate objects.

candidate :
    name : the candidate name
    party : The candidate's party (Republican / Democrat / Other)
    websites : an array of website URLs.

I'm putting this out in case anybody else was searching for the same kind of data. I'd love the hear about any mistakes and/or useful applications for it.  Cheers!

Monday, January 2, 2012

Synonyms for "computational social science"

The other day, a friend asked me how to find other "computational social scientists."  Good question.  I wish I had a good answer -- I might be less bored at conferences.

The big data movement is pulling people together from many different backgrounds, and they're still working out a common language to explain what they do.  If I had to guess, either "data science" or "big data" is going to be the label that sticks, but I'm sure there will be plenty of offshoots, like "computational social science."

To get some new ideas, I took a few of my favorite near-synonyms and ran them through google's keyword tool for adwords.  Then I branched out and searched a little by hand.  Low-tech, but it works.  Here are some of the terms that popped to the top.
  • computational social science
  • data scientist
  • analytics
  • data mining
  • business intelligence
  • visualization
  • big data
  • knowledge discovery
  • text mining
  • network analysis
  • social media analytics
  • social network mining
  • predictive analytics
  • social analytics
  • business analytics
  • ecommerce analytics
  • machine learning
  • statistical programming
  • predictive modeling
I also found a bunch of articles hyping the emergence of a new data-oriented field.  This stuff makes me feel good about job prospects, so I'll pass a couple along here:

What is Data Science?
Jobs For Data Scientists Explode across the Market