Rewriting the parsing code to use HTMLParser
authorDylan Lloyd <dylan@psu.edu>
Tue, 25 Jan 2011 02:24:06 +0000 (21:24 -0500)
committerDylan Lloyd <dylan@psu.edu>
Tue, 25 Jan 2011 02:24:06 +0000 (21:24 -0500)
commitf365c39a36f222266a8140d702dcd4c6549a6431
treebdaa232120309e1bcd6d841729bfe588ba2a5314
parentf4f8bf05055533b23cf51b6949988d1c8029cf26
Rewriting the parsing code to use HTMLParser

HTMLParser is a standard library and should be faster too, although I'd imagine the biggest bottleneck is fetching the files with urllib.

To keep things simple, I just made a new file for now called htmlparse.py to work on it, I'll probably put them together later.
htmlparse.py [new file with mode: 0755]