Skip to content

Releases: andrjas/data_check

0.19.0

18 Mar 06:06
Compare
Choose a tag to compare

Added

  • Python 3.12 support
  • ruff for linting

Changed

  • using pandas.Timestamp instead of datetime for date/datetime columns

Removed

  • custom datetime parsing
  • isort in pre-commit (using ruff instead)
  • black in pre-commit (using ruff instead)
  • eradicate in pre-commit (using ruff instead)

0.18.0

30 Nov 16:30
Compare
Choose a tag to compare

Added

  • partial support for DuckDB
  • partial support for Databricks

Changed

  • updated dependencies
  • simplified dependency update process
  • updated to Pydantic 2
  • simplified db int tests

Removed

  • Python 3.8 support

0.17.0

18 Jul 05:01
Compare
Choose a tag to compare

Added

  • pipeline YAML validation via pydantic
  • more breakpoint step features and documentation

Changed

  • replaced 'overall result' with 'summary'

Fixed

  • load_template and load_lookups called twice in run
  • generating sorted csv for checks
  • updated SQLAlchemy links to 2.0
  • print exception if merging non-unique columns

0.16.0

09 Jun 07:06
Compare
Choose a tag to compare

Added

  • CI with ARM64 MSSQL driver
  • oracledb as alternative for cx_oracle
  • --use-process parameter to switch back to ProcessPoolExecutor

Changed

  • upgraded to pandas 2
  • upgraded to SQLAlchemy 2
  • switched to ThreadPoolExecutor by default

0.15.0

17 Feb 06:58
Compare
Choose a tag to compare

Added

  • 'data_check init' to create projects and pipelines
  • 'append' as alias for append-mode in cli and pipelines
  • 'ping --wait' and --timeout/--retry
  • Python 3.11 support

Changed

  • io module is renamed to file_ops
  • running csv file without matching sql file will fail, otherwise it will run the csv check
  • MSSQL uses arm64 image for CI

Fixed

  • NA/NaT should be treated equally in checks
  • CTRL+C should work in Windows
  • 'data_check gen' works with full table checks

Removed

  • custom docker images for CI

0.14.0

13 Jan 06:48
Compare
Choose a tag to compare

Added

  • pre-commit hooks with various tools for code quality
  • project wide default_load_mode configuration
  • pipelines: added 'files' for 'sql' to deprecate 'sql_files'
  • pipelines: added 'run' as alias for 'check'
  • tests that pipeline steps matches cli
  • pipelines: 'write_check' for 'sql'
  • documentation for 'fake' pipeline step
  • pipelines: added 'table' and 'file' for 'load' to deprecate 'load_table'
  • running data_check_pipeline.yml directly to execute the pipeline

Changed

  • refactored TableInfo into Table
  • moved integration tests into pytest
  • upgraded dependencies

Fixed

  • load fails if csv doesn't have all columns

Deprecated

  • pipelines: 'sql_files' is deprecated, use 'sql' instead
  • pipelines: 'load_table' is deprec

0.13.0

29 Sep 04:39
Compare
Choose a tag to compare

Added

  • upsert mode for loading data into tables
  • pipelines: added 'mode' to deprecate 'load_mode'
  • env variable DATA_CHECK_CONNECTION can override default connection

Changed

  • printing exception on failure without --traceback
  • upgraded dependencies
  • documentation theme

Fixed

  • Oracle: using VARCHAR2 instead of CLOB to load strings and large decimals
  • bug in runner.executor when calculating max_workers

Deprecated

  • pipelines: 'load_mode' is deprecated, use 'mode' instead

Removed

  • workaround for replace mode
  • support for python 3.7
  • importlib-metadata dependency

0.12.0

13 Apr 16:23
Compare
Choose a tag to compare

Added

  • test data generator with Faker

Changed

  • CLI uses subcommands
  • load and load_table in pipeline YAML
  • CI uses DB connections via secrets

Fixed

  • loading mixed date/null values

0.11.1

16 Feb 17:09
Compare
Choose a tag to compare

Fixed

  • SettingWithCopyWarning in failing checks

0.11.0

15 Feb 17:20
Compare
Choose a tag to compare

Added

  • --sql and --sql-files use lookups
  • full table checks
  • --print --diff to print only changed columns
  • --write-check to generate a CSV check

Changed

  • example project moved into subfolder
  • split main into cli module
  • rewrote cli testing using click.testing.CliRunner
  • --sql with --output doesn't print on console

Fixed

  • recursive process spawning
  • pipeline does not stop on error
  • log file is written into project path
  • --print with empty set prints result when failing