You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AraCoders opened this issue
Jan 22, 2024
· 0 comments
Labels
featureIssues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.
Which package is the feature request for? If unsure which one to select, leave blank
None
Feature
PlaywrightCrawler enqueueLinks has two properties: "regexps" and "exclude". however only "regexps" is present for "enqueueLinksByClickingElements".
Motivation
consistency between enqueueLinks and enqueueLinksByClickingElements. because i have a scope to crawl using "regexps" but frequently i need to filter some urls (add them to a blacklist). so for enqueueLinks it's easy. but for enqueueLinksByClickingElements i had to provide 2 regexp: one for the normal scope of the crawler, the other is a negative lookbhind regex to filter some of the urls, however i think it's still not working as expected, because some urls get filtered from the first regex, but still make it to enqueued requests because of the second negative lookbehind regex.
Ideal solution or implementation, and any additional constraints
add the property exclude to "enqueueLinksByClickingElements". and also make it clear in the docs wether the list of regex supplied to "regexps" property should work in a "and" or "or" relationship. same thing for relationship between "regexps" and "exclude" when they are both supplied.
Alternative solutions or implementations
No response
Other context
No response
The text was updated successfully, but these errors were encountered:
featureIssues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.
Which package is the feature request for? If unsure which one to select, leave blank
None
Feature
PlaywrightCrawler enqueueLinks has two properties: "regexps" and "exclude". however only "regexps" is present for "enqueueLinksByClickingElements".
Motivation
consistency between enqueueLinks and enqueueLinksByClickingElements. because i have a scope to crawl using "regexps" but frequently i need to filter some urls (add them to a blacklist). so for enqueueLinks it's easy. but for enqueueLinksByClickingElements i had to provide 2 regexp: one for the normal scope of the crawler, the other is a negative lookbhind regex to filter some of the urls, however i think it's still not working as expected, because some urls get filtered from the first regex, but still make it to enqueued requests because of the second negative lookbehind regex.
Ideal solution or implementation, and any additional constraints
add the property exclude to "enqueueLinksByClickingElements". and also make it clear in the docs wether the list of regex supplied to "regexps" property should work in a "and" or "or" relationship. same thing for relationship between "regexps" and "exclude" when they are both supplied.
Alternative solutions or implementations
No response
Other context
No response
The text was updated successfully, but these errors were encountered: