Blacklist domains #94

whalebot-helmsman · 2021-02-10T09:19:28Z

I was setuping autoextract in scrapy cloud on a project with crawlera addon. Autoextract queries were routed through crawlera. Idea is to blacklist autoextract domain by default. It may have sense for other services, e.g. spalsh.

It is possible to implement this without adding new options, e.g. adding something to https://github.com/scrapy-plugins/scrapy-crawlera/blob/019987f68345079db176405c9f9fbb155ee26f20/scrapy_crawlera/middleware.py#L32

Gallaecio · 2021-02-10T10:57:09Z

I would also log a warning for the first time it happens during a crawl.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blacklist domains #94

Blacklist domains #94

whalebot-helmsman commented Feb 10, 2021

Gallaecio commented Feb 10, 2021

Blacklist domains #94

Blacklist domains #94

Comments

whalebot-helmsman commented Feb 10, 2021

Gallaecio commented Feb 10, 2021