Skip to content

Retrieve a single url in headless mode #488

Closed Answered by olearycrew
Sim4n6 asked this question in Q&A
Discussion options

You must be logged in to vote

@Sim4n6 thanks for your question! I think I've developed a command to get you what you want. The command is:

katana -list urls.txt -d 2 -jc -hl -sr -mr '(.*)\.js'

Where:

  • -list urls.txt Uses your URL list - the format here shouldn't matter
  • -d 2 Sets depth to 2, so you only crawl pages directly from the URLs you sent
  • -jc Sets JavaScript crawling on (optional) to crawl within those JS files
  • -hl Headless
  • -sr Save responses (optional)
  • -mr '(.*)\.js' Match a regex to only craw files ending in.js - you could modify this regex as needed

I hope this helps - and even if it isn't perfect, it at least gives you some idea of what you can do!

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by Sim4n6
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants