*Sigh* This article again. The one that says, "Data can't do everything." This time, David Brooks happens to be the one writing it, but it could have been anybody, really. Brooks gives a list of things that he feels data does poorly ("context", "big problems", "the social"), and then concludes with this gem:
"This is not to argue that big data isn’t a great tool. It’s just that, like any tool, it’s good at some things and not at others."Well, duh!
I'm tired of reading the many incarnations of this article, for two reasons.
- It's obvious. Good data analysts (and anybody with half a brain) is already aware of these kinds of limitations.
- It doesn't move the debate forward. In fact, it clouds the issue.
The debate about data is a debate about scope: "What can and can't be accomplished with data?" This isn't a question that can be resolved using vague generalities. For example, the following logic (based on one of Brooks' rules of thumb) doesn't work: "Well, building a platform where millions of people can share ideas in real time (e.g. twitter) is a 'big problem,' so I guess it can't be solved with data. But convincing my toddler to stop throwing milk at dinner is a 'small problem,' so bring on the statistics!"
It's as if Brooks is claiming he can fix your car without opening the hood. "You can fix red SUVs by flushing out the engine." "You can answer big, social questions by relying on values." A real mechanic would get inside the machine and actually see how it works. "Hmm... for this particular big, social question, you have lots of data on X and Y, and a little bit on Z, and this portion was captured as part of an experimental design. That means we can infer A, but we can't infer B..."
"I still say we're both entitled to our own methods of fixing the car." |
Data can't do everything. Not even close. But we live in a world swimming in data of increasingly useful types. It seems reasonable to think that we'll be able to do more with that data once we figure out what it's good for. And we can't do that by burning the strawman of omnipotent data, or by trading in mushy platitudes. We need to get specific about real questions and data structure.