Crawlab AI SDK

This is the Python SDK for Crawlab AI, an AI-powered web scraping platform maintained by Crawlab.

Installation

pip install crawlab-ai

Pre-requisites

An API token is required to use this SDK. You can get the API token from the Crawlab official website.

Usage

Get data from a list page

from crawlab_ai import read_list

# Define the URL and fields
url = "https://example.com"

# Get the data without specifying fields
df = read_list(url=url)
print(df)

# You can also specify fields
fields = ["title", "content"]
df = read_list(url=url, fields=fields)

# You can also return a list of dictionaries instead of a DataFrame
data = read_list(url=url, as_dataframe=False)
print(data)

Usage with Scrapy

Create a Scrapy spider by extending ScrapyListSpider:

from crawlab_ai import ScrapyListSpider


class MySpider(ScrapyListSpider):
    name = "my_spider"
    start_urls = ["https://example.com"]
    fields = ["title", "content"]

Then run the spider:

scrapy crawl my_spider

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
crawlab_ai		crawlab_ai
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

crawlab_ai

crawlab_ai

test

test

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Crawlab AI SDK

Installation

Pre-requisites

Usage

Get data from a list page

Usage with Scrapy

About

Releases

Packages

Contributors 2

Languages

License

crawlab-team/crawlab-ai-sdk

Folders and files

Latest commit

History

Repository files navigation

Crawlab AI SDK

Installation

Pre-requisites

Usage

Get data from a list page

Usage with Scrapy

About

Resources

License

Stars

Watchers

Forks

Languages