Monday, October 31, 2011

What is computational social science? Feedback wanted!

At the JITP conference on "the future of computational social science" this spring, the question was raised, "What exactly is computational social science?" After some semantic dithering, there was an awkward pause, and then someone changed the subject.

Having thought about the question in the time since, I'm ready to give a better definition:

Computational social science is research that answers questions in social science using specialized knowledge from computer science.
This definition---especially the "specialized knowledge from computer science" bit---leads in some interesting directions. This post talks through a few of them.

Specialized Knowledge
Here's what I mean by "specialized knowledge."  Practically speaking, computer science treats memory, bandwidth, storage, and especially computation as limited resources. Researchers in the field attempt to allocate those resources efficiently through algorithm design and efficient system architecture. Computer scientists also think a lot about best practices for deploying hardware and developing software.

As a result, this definition partly ties the definition of computational social science to the current state of software. As easy-to-use software becomes available, some areas will cease to require specialized knowledge. As that happens, ``regular'' social science will encroach on areas that were originally computational.

One scenario for future work in computational social science is that a small community of computer-literate developers will supply the rest of the field with software to perform specialized tasks. A similar process took place a generation ago when software packages like SPSS and STATA became available for statistical work.

Another scenario is that resources and skills for compSocSci will be concentrated in the private sector. Many academic researchers see this as a bad thing. The selfish logic of patents and trade secrets could easily lead to hoarding of proprietary data and code, and hold up the progress of scientific discovery. On the other hand, hoarding happens in the academy too, and some tech firms are big proponents of open source, so I think it's an open question which institutional structures will work best.





*Not* Computational
There are at least two types of research that are not computational under my definition. This is a good thing. We want compSocSci to be a big tent, but we also want the term to mean something.  Meaning something implies that there are things that aren't computational.

First, research is not computational just because it uses computers. Thus, most researchers using Lexis-Nexis, STATA, and even Amazon's Mechanical Turk are nor doing computational social science, because those applications can be used without specialized knowledge from computer science.  I would classify most work in R as computational, because it requires knowledge of programming.

Second, research on technology and politics is not necessarily computational. For instance, Karpf's ethnographic work on bloggers and other Internet activists is excellent, but not really computational.  (His blogosphere authority index is an exception, since it required knowledge of web development.) Other examples include content analysis of websites, using web surveys to collect data, rhetorical analysis of Youtube videos, and so on. In this work, information technology appears in the research question, but not the methodology.


Where next?
There are at least five computational areas that can make huge contributions to social science:
  • Information retrieval: techniques for acquiring and storing Big Data
  • Machine learning, NLP, and complex network analysis: statistical techniques for drawing inferences from new types of data
  • Simulation: using computers to explore the behavior of systems.
  • Web design  and human-computer interaction:
  • Computability and complexity theory: two branches of mathematics investigating the nature of information and computation

I'm planning to write about all five of these over the month or so.

But first, does this definition work? Are there borderline cases I've missed? There's enough interest in compSocSci that I think it's worth thinking through these issues. I'd love to get feedback on these ideas.

Discuss.

No comments:

Post a Comment