Fri 14 May 2010, by Seppe "Macuyiko" vanden Broucke
Just a note for myself, a list of interesting Python tools for my next web scraping project:
- urllib2: extensible library for opening URLs
- PyQuery: jQuery-like traversing and selecting for Python
- mechanize: stateful programmatic web browsing in Python
- Beautiful Soup: not supported/maintained that much anymore. Latest versions are rather slow and buggy
- Scrapy: looks nice, includes the URL requesting part as well, with cookie support and such
- lxml.html: lxml is a Pythonic binding for the libxml2 and libxslt libraries
Probably going with Scrapy.