mwpdfify

Batch download multiple pages from MediaWiki sites (All pages or pages of a category) to printable PDFs.

Install / Run

pip install mwpdfify

...or clone repo and pip install .

...or directly download and run src/mwpdfify.py

There are two PDF rendering backends to choose from: pdfkit (installed as a dependency by default) or weasyprint. Use pip install -r requirements.txt to install both or choose one yourself. If using the former remember to also install wkhtmltopdf on your system.

Usage

Get the address of the root of your wiki, where its api.php and index.php resides. Typically it's identical to the site's root (/). For Wikipedia it's at /w/; tell me if there are other exceptions ;)
(optional) If you want only a specific category, get its title (in the form of Category:XXX)
Run the script. eg.:
- mwpdfify https://lycoris-recoil.fandom.com - Download all pages (as in Special:AllPages) from Lycoris Recoil Fandom Wiki as PDF
- mwpdfify wiki.archlinux.org -c Category:Installation_process - Download all pages under Category:Installation_process from ArchWiki as PDF
- mwpdfify https://en.wikipedia.org/w/ -c Category:Guangzhou_Metro_stations -l 10 -t 4 - Download all pages under Category:Guangzhou_Metro_stations (except subcategories) from Wikipedia, with 4 download threads and an one-time query limit of 10

The downloaded PDFs should be avaliable in a folder marked with the site's domain name in the current directory.

See below for other parameters:

usage: mwpdfify [-h] [-c CATEGORY] [-p] [-t THREADS] [-l LIMIT] [-w] url

positional arguments:
  url                   site root of destination site

options:
  -h, --help            show this help message and exit
  -c CATEGORY, --category CATEGORY
                        Download only a specified category
  -p, --no-printable    Force normal instead of printable version of pages
  -t THREADS, --threads THREADS
                        Number of download threads, defaults to 8
  -l LIMIT, --limit LIMIT
                        Limit of JSON info returned at once, defaults to maximum
                        (0)
  -w, --use-weasyprint  Use weasyprint as PDF rendering backend

Known issues

&printable=yes is deprecated in recent versions of MediaWiki (while no substitute API solutions are provided) so there might be layout issues when used with certain wikis; especially Fandom wikis as they also contain ads.
Recursively download pages from subcategories of a category is currently not supported.

Changelog

v1.1.2 (2022/09/30):
- Set pdfkit as required dependency
v1.1 (2022/09/04):
- Changed address handling logic
- Bug fixes
v1.0 (2022/09/03):
- Initial release

License

LGPLv3

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
PoC.py		PoC.py
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

PoC.py

PoC.py

README.md

README.md

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

setup.cfg

setup.cfg

Repository files navigation

mwpdfify

Install / Run

Usage

Known issues

Changelog

License

About

Releases

Packages

Languages

License

curiousclinician/mwpdfify

Folders and files

Latest commit

History

Repository files navigation

mwpdfify

Install / Run

Usage

Known issues

Changelog

License

About

Resources

License

Stars

Watchers

Forks

Languages