pg-bulk-ingest

A Python utility function for ingesting data into SQLAlchemy-defined PostgreSQL tables, automatically migrating them as needed, allowing concurrent reads as much as possible.

Allowing concurrent writes is not an aim of pg-bulk-ingest. It is designed for use in ETL pipelines where PostgreSQL is used as a data warehouse, and the only writes to the table are from pg-bulk-ingest. It is assumed that there is only one pg-bulk-ingest running against a given table at any one time.

Features

pg-bulk-ingest exposes a single function as its API that:

Creates the tables if necessary
Migrates any existing tables if necessary, minimising locking
Ingests data in batches, where each batch is ingested in its own transaction
Handles "high-watermarking" to carry on from where a previous ingest finished or errored
Optionally performs an "upsert", matching rows on primary key
Optionally deletes all existing rows before ingestion
Optionally calls a callback just before each batch is visible to other database clients

Visit the pg-bulk-ingest documentation for usage instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 483 Commits
.github/workflows		.github/workflows
docs		docs
.coveragerc		.coveragerc
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
eleventy.config.js		eleventy.config.js
package-lock.json		package-lock.json
package.json		package.json
pg_bulk_ingest.py		pg_bulk_ingest.py
pyproject.toml		pyproject.toml
start-services.sh		start-services.sh
stop-services.sh		stop-services.sh
test_pg_bulk_ingest.py		test_pg_bulk_ingest.py

License

uktrade/pg-bulk-ingest

Folders and files

Latest commit

History

Repository files navigation

pg-bulk-ingest

Features

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Languages