Monday, October 31, 2011

What is computational social science? Feedback wanted!

At the JITP conference on "the future of computational social science" this spring, the question was raised, "What exactly is computational social science?" After some semantic dithering, there was an awkward pause, and then someone changed the subject.

Having thought about the question in the time since, I'm ready to give a better definition:

Computational social science is research that answers questions in social science using specialized knowledge from computer science.
This definition---especially the "specialized knowledge from computer science" bit---leads in some interesting directions. This post talks through a few of them.

Specialized Knowledge
Here's what I mean by "specialized knowledge."  Practically speaking, computer science treats memory, bandwidth, storage, and especially computation as limited resources. Researchers in the field attempt to allocate those resources efficiently through algorithm design and efficient system architecture. Computer scientists also think a lot about best practices for deploying hardware and developing software.

As a result, this definition partly ties the definition of computational social science to the current state of software. As easy-to-use software becomes available, some areas will cease to require specialized knowledge. As that happens, ``regular'' social science will encroach on areas that were originally computational.

One scenario for future work in computational social science is that a small community of computer-literate developers will supply the rest of the field with software to perform specialized tasks. A similar process took place a generation ago when software packages like SPSS and STATA became available for statistical work.

Another scenario is that resources and skills for compSocSci will be concentrated in the private sector. Many academic researchers see this as a bad thing. The selfish logic of patents and trade secrets could easily lead to hoarding of proprietary data and code, and hold up the progress of scientific discovery. On the other hand, hoarding happens in the academy too, and some tech firms are big proponents of open source, so I think it's an open question which institutional structures will work best.

*Not* Computational
There are at least two types of research that are not computational under my definition. This is a good thing. We want compSocSci to be a big tent, but we also want the term to mean something.  Meaning something implies that there are things that aren't computational.

First, research is not computational just because it uses computers. Thus, most researchers using Lexis-Nexis, STATA, and even Amazon's Mechanical Turk are nor doing computational social science, because those applications can be used without specialized knowledge from computer science.  I would classify most work in R as computational, because it requires knowledge of programming.

Second, research on technology and politics is not necessarily computational. For instance, Karpf's ethnographic work on bloggers and other Internet activists is excellent, but not really computational.  (His blogosphere authority index is an exception, since it required knowledge of web development.) Other examples include content analysis of websites, using web surveys to collect data, rhetorical analysis of Youtube videos, and so on. In this work, information technology appears in the research question, but not the methodology.

Where next?
There are at least five computational areas that can make huge contributions to social science:
  • Information retrieval: techniques for acquiring and storing Big Data
  • Machine learning, NLP, and complex network analysis: statistical techniques for drawing inferences from new types of data
  • Simulation: using computers to explore the behavior of systems.
  • Web design  and human-computer interaction:
  • Computability and complexity theory: two branches of mathematics investigating the nature of information and computation

I'm planning to write about all five of these over the month or so.

But first, does this definition work? Are there borderline cases I've missed? There's enough interest in compSocSci that I think it's worth thinking through these issues. I'd love to get feedback on these ideas.


Friday, October 28, 2011

Ruby on rails on EC2 in 20 minutes

I wound up with an extra 20 minutes the other day, so I looked to see how hard it would be to set up a rails server on an EC2 instance.

Answer: very not hard.

I haven't done much with rails in the past (One week of code with my brother looking over my shoulder.)

I grabbed the RubyStack 2.3-0 Dev (Ubuntu 10.04) AMI from

Once it initialized, I ssh'ed in and ran:
rails new blog
cd blog
I followed the directions here to edit the gemfile, then ran:

rake db:create
rails server
Tada! Instant rails server.

PS - Yes, yes I know about heroku.  In this case, building the app as an EC2 AMI is an essential part of the project. I want non-programmers to be able to clone the instance, and sharing a public AMI makes things pretty easy.

Thursday, October 27, 2011

JSON editors

I've been doing a lot of work with json lately, and realized that it would be handy to have a program to edit and validate json syntax.

Turns out I'm not the first person to see this need.  Here are two nice, in-browser json editors.  Nifty!

Wednesday, October 26, 2011


Here's a great little site that lets you test regular expressions. The splash page says:
Welcome to RegExr 0.3b, an intuitive tool for learning, writing, and testing Regular Expressions. Key features include:

  • real time results: shows results as you type
  • code hinting: roll over your expression to see info on specific elements
  • detailed results: roll over a match to see details & view group info below
  • built in regex guide: double click entries to insert them into your expression
  • online & desktop: or download the desktop version for Mac, Windows, or Linux
  • save your expressions: My Saved expressions are saved locally
  • search Community expressions and add your own
  • create Share Links to send your expressions to co-workers or link to them on Twitter or your blog [ex.]

Built by with Flex 3 [] and Spelling Plus Library for text highlighting [].

Tuesday, October 25, 2011

Repost: Analysis of Steve Jobs tribute messages.

Here.  The analysis is interesting, even if the presentation is kind of slow. Full source code (python and nltk, natch) is included.

HT FlowingData

Monday, October 24, 2011

Computational social science

Welcome to my computational social science blog!  In fact, so few other blogs cover this topic, that it's pretty close to being "the" computational social science blog.

My goal is to document ideas about computational social science as I encounter them in the course of my work.  I'll post links, papers, scripts, software, and other bits and pieces. Over time, I hope this will grow into a useful repository for people interested in using computers to study social dynamics. If the blog gathers enough like-minded readership, it might also turn into a good place for discussion.

I'll be setting up formating, etc. in my spare time over the next couple weeks.  Please let me know if you run into rough edges.