Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are links with empty href ignored? (button links handled by page js) #373

Open
YuriGor opened this issue Oct 16, 2020 · 1 comment
Open

Comments

@YuriGor
Copy link

YuriGor commented Oct 16, 2020

What is the current behavior?
Looks like crawler doesn't call preRequest for links with empty href?

If the current behavior is a bug, please provide the steps to reproduce

const HCCrawler = require('headless-chrome-crawler');
const seedUrl = 'https://en.comparis.ch/gesundheit/arzt/search?searchcat=doctor';
const capUrl = 'https://en.comparis.ch/gesundheit/arzt';

const testUrl = (url) => !url || url.startsWith(capUrl);
HCCrawler.launch({
  obeyRobotsTxt: false,
  args: ['--disable-web-security'],
  maxDepth: 2,
  preRequest: (options) => console.log(`${testUrl(options.url) ? '+' : '-'} [${options.url}]`) || testUrl(options.url),
  evaluatePage: (() => ({ text: window.document.body.innerText })
  ),
  onSuccess: ((result) => {
    // console.log(` === ${result.options.url} === `);
  }),
})
  .then((crawler) => {
    crawler.queue(seedUrl);
    crawler.onIdle()
      .then(() => crawler.close());
  });

What is the expected behavior?

I expect to see in the console log empty URLs are tested. For example pagination buttons.

What is the motivation / use case for changing the behavior?
to be able to navigate in dynamic sites, where we have links with empty href attr handled by page javascript.

Please tell us about your environment:

  • Version: 1.8.0
  • Platform / OS version: Ubuntu / 20.04
  • Node.js version: v14.0.0
@kulikalov
Copy link
Contributor

Hi @YuriGor ! Could you provide another target to reproduce the issue? Looks like https://en.comparis.ch is down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants