Python Web Scraping Tools

Just a note for myself, a list of interesting Python tools for my next web scraping project:

  • urllib2: extensible library for opening URLs
  • PyQuery: jQuery-like traversing and selecting for Python
  • mechanize: stateful programmatic web browsing in Python
  • Beautiful Soup: not supported/maintained that much anymore. Latest versions are rather slow and buggy
  • Scrapy: looks nice, includes the URL requesting part as well, with cookie support and such
  • lxml.html: lxml is a Pythonic binding for the libxml2 and libxslt libraries

Probably going with Scrapy.