Help command:
cargo run -- -h
Pricy bot is fully modular. You can easily add a new shop to it. List of available shops:
cargo run -- -l
To make json output more readable, use jq
command. You can install it from here.
Following commands shows how crawler start scraping from johnlewis.com
.
cargo run -- -s johnlewis_com | jq '.'
Crawling only 10 products from a shop:
cargo run -- -s johnlewis_com --limit-products 10
Filter products by keyword:
cargo run -- -s johnlewis_com --filter-keyword original
Filter products by URL:
cargo run -- -s johnlewis_com --filter-url /p54
Combine filters:
cargo run -- -s johnlewis_com --filter-url /p54 --filter-keyword original
Use a custom user-agent:
cargo run -- -s johnlewis_com --user-agent "MyStrong Bot/1.0.0"
Use a custom proxy:
cargo run -- -s johnlewis_com --proxy http://localhost:3001
cargo run -- -s johnlewis_com --save-sitemap
- If sitemap is in
gzip
format
cargo run -- -s yourshop_com --save-sitemap --gzip true
Crawl a single product:
echo https://www.johnlewis.com/john-lewis-partners-jl111-wildflower-print-sewing-machine-blue/p5548442 | cargo run -- -p
Crawl multiple products from different shops:
cargo run -- -m https://www.johnlewis.com/john-lewis-partners-jl111-wildflower-print-sewing-machine-blue/p5548442,https://www.johnlewis.com/john-lewis-partners-jl111-wildflower-print-sewing-machine-blue/p552242
What is the format of crawled data? Sample