#

web-data-extraction

Here are 20 public repositories matching this topic...

dariga-sm / Word-Frequency-in-Moby-Dick

Scrape the novel Moby Dick from the website Project Gutenberg using the Python package requests. Then you'll extract words from this web data using BeautifulSoup. Finally, we'll dive into analyzing the distribution of words using the Natural Language ToolKit (nltk)

python requests beautifulsoup nlp-machine-learning case-study web-data-extraction

Updated Oct 21, 2019
HTML

gonzalopezgil / scraping-interface

Python-based desktop app for effortless web scraping

desktop-app python cross-platform pyqt5 web-scraping xpath web-data-extraction browsing web-pages user-friendly-interface

Updated Jun 26, 2023
Python

sc10ntech / extract-site-metadata

Metadata extractor for the sprawling web ⚙️

metadata-extraction web-data-extraction open-graph-protocol

Updated Jan 8, 2023
TypeScript

wbsg-uni-mannheim / StructuredDataProfiler

Java project for profiling the results of the yearly Web Data Commons extraction of structured data with RDFa, Microdata, Microformat, and Embedded JSON-LD annotations.

schema-org json-ld microdata profiling web-data-extraction

Updated Oct 17, 2022
Java

chelvanai / Web-data-scrap

Web data scrpe by scrapy

scrapy web-data-extraction

Updated Nov 13, 2019
Python

wbsg-uni-mannheim / wdc-page

This repository contains the source files of the Web Data Commons website and is used to maintain the site. The Web Data Commons project extracts structured data from the Common Crawl

web-data-extraction

Updated Mar 15, 2024
HTML

wbsg-uni-mannheim / schemaorg-tables

This repository contains the code and data download links to reproduce the building process of the 2021 Schema.org Table Corpus.

schema-org web-data-extraction web-tables

Updated May 12, 2021
Python

lekhmanrus / real-shot-pdf

RealShotPDF is a Chrome extension designed to simplify the process of creating PDF documents from web content. The extension allows users to navigate through selected webpages, parse and display links in a tree view, and generate PDFs for the chosen pages. It operates locally without sending any data to external servers.

Updated Mar 1, 2024
TypeScript

oxpath / oxpath

OXPath from Oxford

scraper web ajax web-data-extraction

Updated May 20, 2022
Java

hoxhaeris / get_muitiple

Get and process multiple resources from web, using asyncio (aiohttp) to fetch the data and multiprocessing/multithreading for processing it.

python3 web-scraping asyncio web-data-extraction

Updated Mar 4, 2021
Python

ranajahanzaib / wdx

A web data extraction library written in golang.

scraper mongodb nextjs web-data-extraction go-scraper

Updated Apr 19, 2024
Go

wbsg-uni-mannheim / WDCFramework

Java Framework which is used by the Web Data Commons project to extract Microdata, Microformats and RDFa data, Web graphs, and HTML tables from the web crawls provided by the Common Crawl Foundation.

schema-org json-ld microdata web-data-extraction

Updated Dec 13, 2022
Java

dstark5 / gnews-scraper

GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information

typescript web-scraping json-parsing web-crawling google-news data-scraping google-news-scraper web-data-extraction web-automation keyword-search gnews news-scraping gnews-api article-extraction gnews-scraper

Updated Aug 19, 2023
TypeScript

kaizenplatform / FacebookInsightsConnector

The Tableau Web Data Connector for Facebook Insights API

facebook tableau facebook-insights web-data-extraction

Updated Jun 26, 2017
JavaScript

Boomslet / Web_Crawler

Open-source web crawler

python url html open-source website opensource links web-crawler urls free data-extraction webcrawler web-crawling web-data-extraction urllib web-crawler-python

Updated Jul 21, 2018
Python

luminati-io / java-web-scraping

Quick guide with code example how to use Java for web scraping

java maven scraping-websites web-data-extraction

Updated Nov 29, 2022

DemonMartin / scrappey-wrapper

An API wrapper for Scrappey.com written in Node.js (cloudflare bypass & solver)

web-scraping data-extraction web-data-extraction scraping-framework scraping-tool cloudflare-bypass web-scraping-solution cloudflare-solver api-scraping scraping-solution website-data-extraction scraping-library cloudflare-anti-bot scraping-service data-scraping-tool website-scraping-tool turnstile-solver

Updated Jan 10, 2024
JavaScript

jjonescz / awe

AI-based web extractor

deep-learning information-extraction web-scraping web-data-extraction structured-web-data

Updated Feb 25, 2023
Python

codercurious / crunchbase-scraper

Scrape crunchbase companies, people, investors, acquisitions data including website urls, social urls, emails, phone numbers, employee count, funding information etc.

leads crunchbase investors web-scrapers web-data-extraction lead-generation scraping-web scraper-api crunchbase-api crunchbase-scraper company-scraper leads-scraper

Updated Jan 15, 2024

MohamedHmini / iww

AI based web-wrapper for web-content-extraction

python data-mining library ai information-extraction web-scraping web-mining web-content-extractor web-data-extraction

Updated Feb 6, 2023
Python

Improve this page

Add a description, image, and links to the web-data-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-data-extraction topic, visit your repo's landing page and select "manage topics."