johnsonwangzs / WebSpider Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

在学习《Python3网络爬虫开发实战》这本书的过程中，进行的一些记录和练习。其中大部分学习的内容是根据书中讲解和案例进行的实现。

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
ex_aiohttp		ex_aiohttp
ex_captcha		ex_captcha
ex_coroutine		ex_coroutine
ex_dataFrame_usage		ex_dataFrame_usage
ex_httpx		ex_httpx
ex_logging		ex_logging
ex_login		ex_login
ex_multiProcessing		ex_multiProcessing
ex_regex		ex_regex
ex_requests		ex_requests
ex_selenium		ex_selenium
ex_urllib		ex_urllib
ex_xpath		ex_xpath
webSpiders		webSpiders
README.md		README.md

Repository files navigation

WebSpider

在学习《Python3网络爬虫开发实战》这本书的过程中，进行的一些记录和练习。其中大部分学习的内容是根据书中讲解和案例进行的实现。

其中：

【ex_aiohttp】aiohttp库的基本用法
【ex_captcha】验证码的处理（tesserocr和opencv）。
【ex_coroutine】协程的基本使用
【ex_dataFrame_usage】一些数据存储格式（csv/JSON/MySQL/MongoDB）
【ex_httpx】httpx库的基本用法
【ex_logging】logging库的基本用法
【ex_login】模拟登录（Session+Cookie和JWT）
【ex_multiProcessing】多进程的基本使用
【ex_regex】正则表达式的基本使用
【ex_requests】requests库的基本用法
【ex_selenium】selenium库的使用
【ex_urllib】urllib库的基本用法
【ex_xpath】lxml库的基本用法
【webSpiders】爬虫实战练习
- 【demo_webSpider_1】使用requests库，爬取网站 https://ssr.scrape.center 的所有内容，以JSON格式存储到本地。
- 【demo_webSpider_2】使用requests库的自动销假脚本（针对学校的信息系统）。
- 【demo_webSpider_3】使用requests库，爬取Ajax网页（多进程）。以 https://spa1.scrape.center 为例。
- 【demo_webSpider_4】使用aiohttp库，异步（协程+多进程）爬取 https://spa5.scrape.center/。（该网站由JS渲染得到，数据可通过Ajax接口获取；无反爬和加密措施）。
- 【demo_webSpider_5】使用requests库，爬取豆瓣电影Top250榜单及电影详情。
- 【demo_webSpider_6】使用Selenium库，爬取 https://spa2.scrape.center。

About

在学习《Python3网络爬虫开发实战》这本书的过程中，进行的一些记录和练习。其中大部分学习的内容是根据书中讲解和案例进行的实现。

scraping-websites webspider

Report repository

Releases

No releases published

Packages

No packages published

Languages