Saturday, April 13, 2013

Python multiprocessing: 8x speed in 8 lines of code

At work last week I demo'ed some python parallel processing tricks.  Nothing fancy -- just standard usage of the multiprocessing library -- but these things can be a revelation if you haven't seen them before.

Like many things, python makes basic multiprocessing very easy: ~8 more lines of code can let you use all 8 cores of your laptop.  In practical terms, it's lovely to improve your workflow from "run algorithm preprocessing overnight" to "run algorithm preprocessing during lunch."


Here's how it's done.

from multiprocessing import Pool

def my_function( a_single_argument ):
    # e.g. "accepts a filename and returns results in a tuple"
    ...

my_data_list = [...]
my_pool = Pool()
my_results = my_pool.map( my_func, my_list )

That's all it takes.  A few tips:

First, the mapped function can only accept a single argument.  This is usually pretty easy to solve, by wrapping the arguments as a tuple:

def my_function( threeple_arg ):
    arg_1, arg_2, arg_3 = threeple_arg
    ...

Second, debugging in multiprocessing is a pain.  I often invoke the function this way first for debugging, then switch to multiprocessing once I know everything works:

#my_results = [my_func(item) for item in my_data_list]

It's a little hacky, but I'll often leave the line as a comment throughout development, switching between serial and parallel processing as need demands.

Last, a hint on Pool: you can pass an integer to the initialization routine to tell it how many subprocesses to use:

my_pool = Pool(5)

If you omit the argument, python assumes you want to run as many subprocesses (not threads!) as you have cores (e.g. 8 on a MacBooks pro).  If you want to save some processing power for other tasks, you might want to specify something lower, like 6.

If you're running tasks with high latency (e.g. web spidering, or lots of disk read/writes across a network) it sometimes makes sense to use more subprocesses than you have cores.  For example, I'll often throw 40 pool workers at a quick web-scraping script, just to speed things up.  However, if performance really matters, Pools with latency are very hard to tune and scale.  For anything more than a one-off data grab, you'll be better off with a queue-based tool, like scrapy, Amazon SQS, or celery.

HTH