Project: Credly Crawler (WIP)

This project consists of Python scripts designed to crawl and extract data from the Credly platform. The main components of the project are:

crawl-by-arg.py
crawl-by-search-terms.py
crawl-by-skills.py
get-badges.py
helper.py

Requirements

Python 3.x requests library Install the requirements using the following command:

pip install requests

Usage

1. crawl-by-arg.py

This script crawls the Credly platform using a single search term passed as a command-line argument.

Usage:

python crawl-by-arg.py <search_term>

2. crawl-by-search-terms.py

This script crawls the Credly platform using a list of search terms specified in the data/search-terms.json file.

Usage:

python crawl-by-search-terms.py

3. crawl-by-skills.py

This script crawls the Credly platform using a list of skills that are retrieved from the data/skills.json file.

Usage:

python crawl-by-skills.py

4. get-badges.py

This script retrieves all badges for each organization specified in the data/organizations.json file. The badges are then saved to the data/badges.json file.

Usage:

python get-badges.py

5. helper.py

This script contains helper functions used by the other scripts in this project. Functions include:

get_skills_file()
get_organizations_file()
get_badges_file()
get_search_terms_file()
get_items_by_search_term(search_term)
search_terms()
get_items_from_file(file_name)
set_items_from_file(file_name, items)
crawl_search_terms(terms)

Notes

Before running the scripts, make sure to create the necessary data files in the data directory:

skills.json
organizations.json
badges.json
search-terms.json

Each of these files should contain an empty JSON object {} if there is no initial data.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github		.github
data		data
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
crawl-by-arg.py		crawl-by-arg.py
crawl-by-search-terms.py		crawl-by-search-terms.py
crawl-by-skills.py		crawl-by-skills.py
get-badges-for-id.py		get-badges-for-id.py
get-badges.py		get-badges.py
helper.py		helper.py
index-badges.py		index-badges.py
requirements.txt		requirements.txt

Joeri-Abbo/python-credly-scraper

Folders and files

Latest commit

History

Repository files navigation

Project: Credly Crawler (WIP)

Requirements

Usage

1. crawl-by-arg.py

2. crawl-by-search-terms.py

Usage:

3. crawl-by-skills.py

Usage:

4. get-badges.py

Usage:

5. helper.py

Notes

About

Topics

Resources

Stars

Watchers

Forks

Languages