Skip to content

AndrewKhassapov/website-to-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Website to PDF

A web crawler that prints a website to .pdf format

🌐🕸️ ⏩ 📂📜

Requirements:

✔️ 🐍 python 3.x environment

✔️ 📁 wkHTMLtoPDF installed on system

✔️ 🐍 pdfkit pypi library. pdfkit is a python wrapper for wkHTMLtoPDF.

✔️ 🐍 BeautifulSoup 4 pypi library

How to use:

▶️ Set list urls_to_parse with all URLs to save to .pdf format.

urls_to_parse = ["<URL_1>", "<URL_2>", ..., "<URL_N>"] # Where URL_n is your desired URL.

The list can be collected by either:

🅰️ ➡️ Using return from get_url_list_from_site( <MY SITE eg. http://example.com> )

or

🅱️ ➡️ Using return from get_url_list_from_file( <MY FILE | DEFAULT = input/urls.txt> )

▶️ Run website-to-pdf.py

▶️ All URLs will be saved as .pdf to the output/ directory from source website-to-pdf.py

License:

MIT license compliant. Software provided as is. All content is free to use and modify.

andrewkhassapov github1

Footnotes

  1. GitHub shields provided by Shields.io