In this repository, we will use python's library beautifulSoup to scrape a website. The technique of taking the html file sent by the server into python and scrapping it instead of giving it to the browser and displaying it is called Web scraping.
Beautiful Soup Documentation -https://www.crummy.com/software/BeautifulSoup/bs4/doc/
- The technique of taking the html file sent by the server into python and scrapping it instead of giving it to the browser and displaying it is called Web scraping.
- Using API
- HTML web scraping using some tool like bs4
- In order to use the power of python to scrape websites, we don’t have to write new code for everything. We can use existing code written by experts. Why take the hard path when the outcome is the same, when you can do it easily in some lines of code in a very short period of time?
Modules are very easy to install. Open command prompt and just write these three lines one by one:
- pip install requests
- pip install html5lib
- pip install bs4
- scrape.py- scraping a simple website
- blog_scrape.py- scraping data from a blog
- wikipedia_scrape.py- scraping data from wikipedia