Data Vault Pipeline Description (DVPD)

Concept and reference implementation

(C) Matthias Wegner, cimt ag

Creative Commons License CC BY-ND 4.0

This repository contains the documentation of the "Data Vault Description Pipeline" concept and a reference implementation with multiple test cases and examples.

The concept in "3 words"

The Data Vault Pipeline Description(DPVD) defines a document syntax to describle all metadata, that is needed to implement a process wich loads one source object into a data vault model.

This provides a standardized interface between all steps of the implementation workflow and allows a decoupling between the tools, that are used during design and implementation. As a document, the DVPD also represents a encapsulated deployable artifact and therefore supports the implementation of automated CI/CD workflows.

Full Documentation is in this repository. Best start is DVPD_Introduction_and_orientation.md

Motivation

Loading data into a data warehouse is a complex task even when using the Data Vault methods, wich provide a lot of standardization and generalization. Many tools and frameworks try to support the modelling and implementation process.

Functions needed are: Specification of the usecase, Specification and Analysis of source data structure, Modelling of the Data Vault and mapping of the data, implementing the load process (fetch data from source, transform and load to data vault model), deployment of the processes, schedule und execute processes, monitor progress. All these steps contain a deep complexity by themself. A product, that supports all of these phases in an equal appropriate excellency and functional flexibility, is nearly impossible to implement.*

So data warehouse platforms often contain a bundle of tools with a mix of commercial products and self written code. One major function needed in theses workflows is the communication of the metadata, that is forged during the analysis and modelling steps. This metadata is needed for the implementation, and in best case can be used to generate the processing automatically.

DVPD provides a format, to solve this problem.

*This product needs to solve a high varyity of scenarios, but from the perspecive of a single project, only a small amount is needed. You dont pay the price for 300 functions, when you only need 10 of them

What you find in this repository

Concept Documentation

Description of the concept
Reference of the core syntax of DVPD
Analysis about the use case variations to cover by the syntax a. Data Mapping variation taxonomy a. Data Mapping dependend process generation a. Partitioned deletion scenarios

Reference implementation

PostgreSQL tables and views to implement a DVPD compiler
Documentaion about the structure and usage of the DVPD views
PostgreSQL tables and view to implement automated testing of the compiler
Testsets
Python scripts to deploy the tables and view automatically

Name		Name	Last commit message	Last commit date
Latest commit History 756 Commits
__attic__		__attic__
commands		commands
config_template		config_template
datamodel		datamodel
documentation		documentation
experimental/java-dvpi-consumer		experimental/java-dvpi-consumer
lib		lib
processes		processes
testset_and_examples		testset_and_examples
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
__init__.py		__init__.py
readme.md		readme.md

License

cimt-ag/data_vault_pipelinedescription

Folders and files

Latest commit

History

Repository files navigation

Data Vault Pipeline Description (DVPD)

The concept in "3 words"

Motivation

What you find in this repository

Concept Documentation

Reference implementation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages