Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Website depth scraping data connector #1191

Merged
merged 9 commits into from May 14, 2024

Conversation

shatfield4
Copy link
Collaborator

@shatfield4 shatfield4 commented Apr 26, 2024

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

resolves #1190

What is in this change?

  • Create data connector that will scrape to X depth of links on site
  • Only finds links with matching domain name on site to scrape only links that are on the same website

Additional Information

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@shatfield4 shatfield4 linked an issue Apr 26, 2024 that may be closed by this pull request
@shatfield4 shatfield4 changed the title WIP website depth scraping, (sort of works) [FEAT] Website depth scraping data connector Apr 26, 2024
@shatfield4 shatfield4 self-assigned this Apr 26, 2024
@shatfield4 shatfield4 marked this pull request as ready for review April 26, 2024 21:52
@shatfield4
Copy link
Collaborator Author

@timothycarambat, refactored based on what we discussed.

  • Creates array of all links so we know how many links before main scraping starts
  • Passes the array to bulk scraping function

@timothycarambat timothycarambat merged commit 612a7e1 into master May 14, 2024
@timothycarambat timothycarambat deleted the 1190-feat-website-scraping-depth branch May 14, 2024 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEAT]: Website scraping depth
2 participants