Monday, November 7, 2011

Resources on NLP sentence diagramming

Here are some notes from a recent search for resources on automatic sentence diagramming. I was looking for code/software to diagram sentences automatically, ideally in python.


Vocab
Also, as far as I can tell, linguists, grammarians, and English majors call sentence diagrams "Reed-Kellogg" diagrams. NLP and computer science types call the diagrams "parse trees" or "concrete syntax trees," and produce them (usually) using probabalistic context free grammars (PCFGs).
 
Search result
If you want code that just works, the Stanford Parser looks like the place to start. It's in java, but I'm sure that won't be too much of a problem, because you can call it from the command line.

Python's NLTK might also work, but there are lots of different parsers, and it's not clear whether and how much training they require.

There are various other software packages out there -- some of them online -- but I doubt they'd support much volume, and batching would be a pain.





Background
http://en.wikipedia.org/wiki/Sentence_diagram
http://en.wikipedia.org/wiki/Parse_tree
http://en.wikipedia.org/wiki/Context-free_grammar

NLTK
http://nltk.googlecode.com/svn/trunk/doc/book/ch08.html
http://www.cs.bgu.ac.il/~elhadad/nlp11/nltk-pcfg.html
http://www.ibm.com/developerworks/linux/library/l-cpnltk/index.html

Stanford Parser
http://nlp.stanford.edu/downloads/lex-parser.shtml
http://nlp.stanford.edu:8080/parser/index.jsp
http://projects.csail.mit.edu/spatial/Stanford_Parser

Other Software & Code
http://www.sendraw.ucf.edu/
http://1aiway.com/nlp4net/services/enparser/reedkelloggtree.aspx
http://faculty.washington.edu/dillon/GramResources/

Funny blog posts about politics and grammar, or the lack thereof
http://boingboing.net/2009/02/17/how-obamas-sentences.html
http://www.slate.com/articles/life/the_good_word/2008/10/diagramming_sarah.html

4 comments:

  1. I love the funny ones. Also, make the links actual links?

    ReplyDelete
  2. Yeah, that'd be nice. I copied these from a txt file, and hoped blogger would just take care of it for me. I'll switch them soon.

    You took NLP at BYU?

    ReplyDelete
  3. Ah gotcha. That really should be something blogger does--if gmail can do it, blogger should too!

    Yeah, I took it from Prof. Ringger. Great guy and a really fun class! I discovered only too late in the semester that there are some really good Estonian/English corpora that I should have used for my final project.

    ReplyDelete