Skip to content

uktrade/pg-bulk-ingest

Repository files navigation

pg-bulk-ingest

PyPI package Test suite Code coverage

A Python utility function for ingesting data into SQLAlchemy-defined PostgreSQL tables, automatically migrating them as needed, allowing concurrent reads as much as possible.

Allowing concurrent writes is not an aim of pg-bulk-ingest. It is designed for use in ETL pipelines where PostgreSQL is used as a data warehouse, and the only writes to the table are from pg-bulk-ingest. It is assumed that there is only one pg-bulk-ingest running against a given table at any one time.

Features

pg-bulk-ingest exposes a single function as its API that:

  • Creates the tables if necessary
  • Migrates any existing tables if necessary, minimising locking
  • Ingests data in batches, where each batch is ingested in its own transaction
  • Handles "high-watermarking" to carry on from where a previous ingest finished or errored
  • Optionally performs an "upsert", matching rows on primary key
  • Optionally deletes all existing rows before ingestion
  • Optionally calls a callback just before each batch is visible to other database clients

Visit the pg-bulk-ingest documentation for usage instructions.