- Python 3.12 support
- ruff for linting
- using pandas.Timestamp instead of datetime for date/datetime columns
- custom datetime parsing
- isort in pre-commit (using ruff instead)
- black in pre-commit (using ruff instead)
- eradicate in pre-commit (using ruff instead)
- partial support for DuckDB
- partial support for Databricks
- updated dependencies
- simplified dependency update process
- updated to Pydantic 2
- simplified db int tests
- Python 3.8 support
- pipeline YAML validation via pydantic
- more breakpoint step features and documentation
- replaced 'overall result' with 'summary'
- load_template and load_lookups called twice in run
- generating sorted csv for checks
- updated SQLAlchemy links to 2.0
- print exception if merging non-unique columns
- CI with ARM64 MSSQL driver
- oracledb as alternative for cx_oracle
--use-process
parameter to switch back to ProcessPoolExecutor
- upgraded to pandas 2
- upgraded to SQLAlchemy 2
- switched to ThreadPoolExecutor by default
-- 'data_check init' to create projects and pipelines -- 'append' as alias for append-mode in cli and pipelines -- 'ping --wait' and --timeout/--retry -- Python 3.11 support
-- io module is renamed to file_ops -- running csv file without matching sql file will fail, otherwise it will run the csv check -- MSSQL uses arm64 image for CI
-- NA/NaT should be treated equally in checks -- CTRL+C should work in Windows -- 'data_check gen' works with full table checks
- custom docker images for CI
- pre-commit hooks with various tools for code quality
- project wide default_load_mode configuration
- pipelines: added 'files' for 'sql' to deprecate 'sql_files'
- pipelines: added 'run' as alias for 'check'
- tests that pipeline steps matches cli
- pipelines: 'write_check' for 'sql'
- documentation for 'fake' pipeline step
- pipelines: added 'table' and 'file' for 'load' to deprecate 'load_table'
- running data_check_pipeline.yml directly to execute the pipeline
- refactored TableInfo into Table
- moved integration tests into pytest
- upgraded dependencies
- load fails if csv doesn't have all columns
- pipelines: 'sql_files' is deprecated, use 'sql' instead
- pipelines: 'load_table' is deprecated, use 'load' instead
- upsert mode for loading data into tables
- pipelines: added 'mode' to deprecate 'load_mode'
- env variable DATA_CHECK_CONNECTION can override default connection
- printing exception on failure without --traceback
- upgraded dependencies
- documentation theme
- Oracle: using VARCHAR2 instead of CLOB to load strings and large decimals
- bug in runner.executor when calculating max_workers
- pipelines: 'load_mode' is deprecated, use 'mode' instead
- workaround for replace mode
- support for python 3.7
- importlib-metadata dependency
- test data generator with Faker
- CLI uses subcommands
- load and load_table in pipeline YAML
- CI uses DB connections via secrets
- loading mixed date/null values
- SettingWithCopyWarning in failing checks
--sql
and--sql-files
use lookups- full table checks
--print --diff
to print only changed columns--write-check
to generate a CSV check
- example project moved into subfolder
- split main into cli module
- rewrote cli testing using click.testing.CliRunner
--sql
with--output
doesn't print on console
- recursive process spawning
- pipeline does not stop on error
- log file is written into project path
--print
with empty set prints result when failing
- python 3.10 support
- tox to test multiple python versions
- python 3.6 support
- python tests in int_test
- --sql --output produces empty lines and \r\r\n on windows
- --generate produces non-UTF8 files
- lookups
- always_run steps in pipelines
- print parameter in cmd pipeline step
- logging in config file and --log argument
- tests that --sql and sql in a pipeline use templates
- test for non-ASCII characters in column names
- '' as escape character in CSVs
- --print/--print-json with --verbose prints the output even if it's matching
- --gen argument (use -g or --generate instead)
- handling pd.NaT
- --generate escapes '#' in CSV
- --ping works with --verbose and --traceback
- Excel support (for checks and loading tables)
- --quiet suppresses all output
- simpler empty dataset checks
- testing if all commands are documented
- tests for nullable date columns
- tests for large dates (9999-12-31)
- date handling: columns from the database and CSV files are better recognized as dates
- tests are split into multiple folders
- cli tests are moved to integration tests
- refactoring: DataCheck class no longer inherits the check classes
- date hints
- checks now fail if an invalid path is given
- SQLite int tests
- --sql parses templates in statement
- can start data_check in a subfolder of a project
- print sql statement in pipeline
- pipeline --generate mode
- refactored int tests using templates and pre-built docker images
- renamed --run-sql/run_sql to --sql-files/sql_files
- marking tests as failed if result length differs
- --print outputs CSV format
- --print output is sorted
- --print-csv (as this is what --print does now)
- --sql parameter to run SQL statements directly from command line and print the result as CSV
- sql pipeline step
- --output/-o parameter to write --sql generated CSV file
- --run-sql prints result as CSV if it is a query
- --print --format json and --print-json
- date hints
- support for large dates (e.g. 9999-12-31 00:00:00)
- CSV checks now convert date columns automatically
- collect_data returns ordered list, i.e. serial runs are deterministically
- using truncate table instead of delete where possible
- run_sql fails if running in parallel with multiple files in a folder
- CSV handling numbers with leading zeros as strings
- loading date/timestamp from CSV files
- pipelines
- using Drone CI for integration tests
- renamed --load-method to --load-mode
- old integration test scripts
- loading tables from CSV files
- running any SQL file
- more tests
- integration tests for all supported python versions
- upgraded dependencies
- minimum python version is now 3.6.2
- internal refactoring
- unit tests inside integration tests
- --print-format to output failed data in csv instead of pandas format
- dont start executor pool, when a single file or --workers=1
- CLI tests
- --force to overwrite files when generating
- --print-csv as shortcut for "--print --print-format csv"
- environment variables support for connection strings
- simple jinja templating for SQL queries
- upgraded dependencies
- upgraded SQLAlchemy to 1.4
- stop immediately when using unknown connection
- CLI tests for python 3.6
- colored output on PowerShell
- --verbose argument
- --traceback argument
- -g argument as an alias for --generate
- colored output
- overall result output
- teardown script for integration tests
- using poetry for packaging and integration tests
- failing integration test when any command fails
- SettingWithCopyWarning from pandas when using --print with a failing test
- using ProcessPoolExecutor for parallel queries
- merge fails if a varchar column only has ints/decimals in result
- SAWarning for Oracle
- initial release