Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add max_items #129

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

serhii73
Copy link
Contributor

@serhii73 serhii73 commented Jun 1, 2021

Close #107

@pawelmhm
Copy link
Member

pawelmhm commented Jun 2, 2021

Thank you for this contribution @serhii73. I really appreciate it!

The idea behind this feature is to limit number of items in output. With current implementation it doesn't really do the trick. You can test it by running curl

> curl "http://localhost:9080/crawl.json?url=http://quotes.toscrape.com&spider_name=toscrape-css&max_items=1" -v | jq '.items'

I passed max_items=1 but I got 10 items.

This is because limit_items function is not called when item is scraped. To actually call it when item is scraped you need to plug it into signal item_scraped. There is already one handler registered for this signal it is method "get_item". If you move your code counting items there it should do the trick.

👍 for adding test, test is ok, but it would be also good to add one integration test in test_resource_crawl.TestCrawlResourceIntegration passing max_items parameter to API and checking if it actually works as expected. It is possible that some argument will be work fine in crawl manager but will be filtered earlier in HTTP resource handling. This is why it is recommended to add integration test.

Finally, when we merge it we'll need to also update documentation with this parameter so that people know how to use it.

@serhii73 serhii73 changed the title Add max_items [WIP] Add max_items Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Requesting max_items in scrapyrt
2 participants