Skip to content

geasyheart/lofter-spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

简介

这是一个爬取lofter文章的小爬虫程序,采用Scrapy框架,更多内容可参考官方文档

如何使用

  1. 修改 lofter/lofter/spiders/article_spider.py中的start_urls,更改成要爬取的第一个页面,

如:

class LofterArticleSpider(Spider):
    name = "lofter"

    start_urls = [
        "http://{name}.lofter.com/?page=1" # 此处{name}改成你的名字
    ]
  1. 执行下面命令
virtualenv -p python3 .env
source .env/bin/activate

pip install -r requirements.txt -i  https://mirrors.aliyun.com/pypi/simple/

cd lofter/ && mkdir articles
scrapy crawl lofter
  1. 最终文章将保存在lofter/articles目录下

Releases

No releases published

Packages

No packages published

Languages