Skip to content

vinay-jaju/Web-Scraping-

Repository files navigation

Web-Scraping using Scrapy

Pre-requisites: pip install scrapy

Remember before scraping:

  • There is a robot.txt file for each website for what they allow. Check that out before you scrape so that you don't scrape endpoints which are not allowed.
  • If there is any API available for getting the same info. Use that instead of scraping.

Performed Activities:

  • Learnt using scrapy shell scrapy shell
    • Fetch Command fetch(<put_your_scraping_url_here/endpoint>)
    • The Crawler returned response can be viewed using. view(response) This will open the raw HTML in the default browser.
    • Print received response print(response.text)
    • Extracting element using css selector response.css(".value::text").extract()
    • Using XPath to get the elements. response.xpath("//div").extract()
  • Creating a Scrapy project and custom Spider
    • scrapy startproject aliexpress
    • The command to create a spider scrapy genspider aliexpress_tablets <url>

References: