Scraping Interface

The Scraping Interface project is a cross-platform desktop application developed using Python and the PyQt5 library. It provides a user-friendly interface for web scraping, allowing users to extract information from web pages easily.

✨ Features

Web Scraping: Extract online data using a browser-like interface.
Dynamic Browsing: Browse the web with Chromium and perform standard actions like navigation, page reloads and searching.
XPath Selection: Highlight and select elements on web pages using generalized XPath expressions.
Table Preview: Select data from sites and view it in a table format for easy extraction.
Pagination Support: Extract data from multiple pages with consistent structures, including automatic handling of pagination buttons.
Data Export: Save scraped data in popular formats such as Excel, CSV, JSON, or XML.
Template Management: Save and load scraping configurations for reuse, allowing quick access to previously configured selections.
Authentication Support: Securely store and use encrypted login credentials to access authenticated web pages.
CAPTCHA Handling: Solutions to handle CAPTCHA-protected pages for uninterrupted data extraction.
Process Monitoring: Track and manage scraping processes with progress indicators about the ongoing tasks.

🚀 Installation

To install and run the Scraping Interface application from source, follow these steps:

Clone the repository:

git clone https://github.com/gonzalopezgil/scraping-interface.git

Install the required dependencies:

pip install -r requirements.txt

Run the application:

python main.py

🛠️ Usage

Launch the application, and you will be presented with a user-friendly interface with four tabs: Home, Browser, Processes and Settings.
Use the browser tab to navigate, search and interact with sites.
When you're ready to extract data, navigate to the desired web page and click the "Scrape" button. The program will display the extracted data in a table for preview.
Customize your selection using generalized XPath expressions and modify the table as needed.
Configure pagination settings, save templates for future use, and choose the desired data export format in the respective process.
Monitor the scraping processes in the Processes tab, and manage them by stopping, interacting to solve manual actions or opening the output files.
Adjust application settings, including browser preferences and language management in the Settings tab.

Contributing

Contributions to the Scraping Interface project are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request.

📃 License

This project is licensed under the MIT License. Feel free to use, modify, and distribute the code.

Name		Name	Last commit message	Last commit date
Latest commit History 311 Commits
exceptions		exceptions
gui		gui
scrapers		scrapers
static		static
tests		tests
translations		translations
utils		utils
web		web
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run_integration_tests.py		run_integration_tests.py
run_unit_tests.py		run_unit_tests.py

gonzalopezgil/scraping-interface

Folders and files

Latest commit

History

Repository files navigation

Scraping Interface

✨ Features

🚀 Installation

🛠️ Usage

Contributing

📃 License

About

Topics

Resources

Stars

Watchers

Forks

Languages