Friday, December 23, 2011

How to download every episode of This American Life

I recently discovered this lovely script by one Sean Furukawa.  (I don't know him.)  The script downloads every episode of This American Life in mp3 format.  This American Life is far and away my favorite radio show -- Ira Glass is consistently the best storyteller on air.

Anyway, I'm intimidated by perl (too hard too read, too many nonalphnumeric characters) so I rewrote the script in python. The first run will take a long-ish time, since it's downloading all 450+ existing episodes of the program. Subsequent executing of the script will be faster, since it only has to download new episodes. Enjoy!

By the way, AFAIK, this type of webcrawling is completely legal.  The content is already streamable from the TAL website; you're just downloading it er, a little faster than usual.

That said, if you use this script, I'd recommend making a tax-deductible contribution to This American Life -- it's a great program, worthy of support.  The "donate" button is in the upper-right corner of the This American Life webpage.







#!/usr/bin/python 

# Adapted from: http://www.seanfurukawa.com/?p=246
# Translated from perl to python by Abe Gong
# Dec. 2011

import urllib, glob, datetime

def now():
 """Get the current date and time as a string."""
 return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

def log( S ):
  """Write a line to the log file, and print it for good measure."""
  logfile.write(S + '\n')
  print S

#Start up a log file
logfile = file( 'tal_log.txt', 'a' )

#Load all the episodes that have already been downloaded; keep the filenames in a list
episodes = [ f.split('/')[-1] for f in glob.glob('episodes/*.mp3') ]
#print episodes

#As of today (12/11/2011) there are 452 episodes, so a count up to 500 should last a long while.
for i in range(1,500):

 #Choose the appropriate filename
  filename = str(i)+'.mp3'
  #Add the URL prefix
  url = 'http://audio.thisamericanlife.org/jomamashouse/ismymamashouse/'+filename
  
  #Check to see is the file has already been downloaded
  if not filename in episodes:
    #Log the attempt
    log( now() + '\ttrying\t' + url )
    
    #Try to download it
    code = urllib.urlopen( url ).getcode()
    if code == 200:
      urllib.urlretrieve( url, filename='episodes/'+filename )
    
      #Log the result -- success!
      log( now() + '\tsaved\t' + filename )
    else:
      log( now() + '\tfile not found' )

No comments:

Post a Comment