Also, my posts will focus on "data science" instead of "computational social science." They're pretty much the same thing, but "data science" seems to be the phrase that's catching on.
See you there!
Thoughts on computation, social science, and lifehacking
from an up-and-coming data scientist.
For python, I'd highly recommend installing ipython*, and getting your feet wet with the pandas library. The matplotlib, json, and requests libraries are also good places to know your way around. numpy and scipy have good stuff in them, but they're so huge that it's hard to really "know" them. boto is also worth knowing, but it's only useful if you have a subscription to Amazon Web Services (AWS).
For hadoop, I'd look specifically at mrJob. It's a python-based wrapper that makes hadoop easier to use. mrJob is too new to be covered in books yet, but the online documentation is pretty good. You can use it in test mode even without installing hadoop.Disclaimer: I know know that talking up one language over another is one of the cardinal sins of code. I don't want to start a religious war here -- I'm not saying that python is better than C++, or that pandas is the only way to manipulate data in python (although I believe Wes McKinney deserves a platinum-plated "better than sliced bread" award) -- I'm just saying that in the rough-and-tumble of the hiring process, data scientists with real python skills have a big advantage.
Any graph that includes the caption "lunch" is a good graph. Also "nap." |
Quick question-- I'm writing up a paper right now and need to stick some simple graphs in. Do you have any suggestions ways to make graphs that are prettier than Excel to Word (low bar...ha ha! Accidental pun!)?My response:
Love the pun. :) [Miscellaneous personal stuff...]
On graphs: How many graphs are we talking? If it's just a handful, or if they're all different kinds, I'd recommend Photoshop or Illustrator. Import the graph from excel, and then "trace over" it to give it the styling you'd like. A lot of great data-centric presentations use this trick.
Another option is tableau. It's a pricey, but gives you good tools for designing nice-looking graphs, as well as tools for automating them (i.e. generating 20 graphs with the same basic template.) You might be able to use a 30-day trial; and maybe their student licenses are cheaper than the corporate ones.
If you don't want to shell out for tableau and you're doing *lots* of graphs in the same style, then it might be worth climbing the learning curve for matplotlib, ggplot2, or the google charts API. I doubt this is worth your time because, there'd be such a long learning curve: each of these is a graphing library on top of a programming language, and you'd need facility with both to make them work.
Dear python/ABM enthusiast -
Glad you're interested in python and ABMs. I started on tengolo after a thorough search turned up no good ABM frameworks in python. I worked on it for a short while, then moved on when my dissertation committee told me to focus on stuff that would actually help me graduate. :)
I got far enough in to be confident that a python-based ABM framework like tengolo could work. All the code is in the github repository, and every month I get questions from people asking if it's being actively developed. There's clearly demand for the project, but I don't have time to support it at this point. I'd love to see someone take this ball and run with it.
Best,
Abe
"This is not to argue that big data isn’t a great tool. It’s just that, like any tool, it’s good at some things and not at others."Well, duh!
"I still say we're both entitled to our own methods of fixing the car." |