一边学习《用Python写网络爬虫》一书,一边对于书中因为事例网站的变化而导致的bug进行修复。
This repository contains fixed source code of examples from the book Web Scraping with Python.
E-mail:siyao.chen92@gmail.com
The first bug comes from the websites update.The url input of the 'link_crawler' should be as follows.
link_crawler('http://example.webscraping.com/places', '/places/default/(index|view)', delay=0, num_retries=1,user_agent='BadCrawler')
This repository contains source code of examples from the book Web Scraping with Python, published by Packt Publishing. Examples have been tested with Python 2.7 and depend on:
- BeautifulSoup (Ch 2)
- lxml (Ch 2-9)
- pymongo (Ch 3-5, 9)
- PyQt / PySide (Ch 5)
- ghost (Ch 5)
- Selenium WebDriver (Ch 5, 9)
- mechanize (Ch 6)
- PIL / Pillow (Ch 7)
- pytesseract (Ch 7)
- scrapy (Ch 8)
- portia (Ch 8)
- scrapely (Ch 8)
This examples will break in future as websites change and dependencies are updated, so bug reports and patches are welcome.