Skip to content

pricyproject/pricy-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pricy Crawler

Usage:

Help command:

cargo run -- -h

Pricy bot is fully modular. You can easily add a new shop to it. List of available shops:

cargo run -- -l

To make json output more readable, use jq command. You can install it from here.

Following commands shows how crawler start scraping from johnlewis.com.

cargo run -- -s johnlewis_com  | jq '.'

Crawling only 10 products from a shop:

cargo run -- -s johnlewis_com --limit-products 10 

Filter products by keyword:

cargo run -- -s johnlewis_com --filter-keyword original

Filter products by URL:

cargo run -- -s johnlewis_com --filter-url /p54

Combine filters:

cargo run -- -s johnlewis_com --filter-url /p54 --filter-keyword original

Customize your request:

Use a custom user-agent:

cargo run -- -s johnlewis_com --user-agent "MyStrong Bot/1.0.0"

Use a custom proxy:

cargo run -- -s johnlewis_com --proxy http://localhost:3001
Save sitemap links on storage:
cargo run -- -s johnlewis_com --save-sitemap
  • If sitemap is in gzip format
cargo run -- -s yourshop_com --save-sitemap --gzip true

Crawl a single product:

echo https://www.johnlewis.com/john-lewis-partners-jl111-wildflower-print-sewing-machine-blue/p5548442 | cargo run -- -p

Crawl multiple products from different shops:

cargo run -- -m https://www.johnlewis.com/john-lewis-partners-jl111-wildflower-print-sewing-machine-blue/p5548442,https://www.johnlewis.com/john-lewis-partners-jl111-wildflower-print-sewing-machine-blue/p552242

What is the format of crawled data? Sample

About

Pricy is a product crawler and search engine. This crawler helps you to collect products data on the internet. It's easy to customize.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages