Skip to content

Releases: vmware/versatile-data-kit

v1.7

28 Feb 15:11
893ffb7
Compare
Choose a tag to compare

Major features include:

vdk-structlog

By setting structlog_config_preset users can choose a configuration preset to either LOCAL or CLOUD grouping best logging configuration for those use cases. Any config options set together with the preset will override the preset options..

Example RAG Pipeline

An example of how to build end to end chatbot using VDK:

vdk dag local execution

To be able to test now you can execute the entire dag locally on your machine without needing to deploy

Make sure all data job directories are on the same level

export DAGS_JOB_EXECUTOR_TYPE=local

Then run dag job as normal:

vdk run dag-job

Or from IDE as explained here and set DAGS_JOB_EXECUTOR_TYPE=local as an environment variable in the run configuration

See more in VDK DAG documentation

Support for Python 3.12

Added official support and testing for Python 3.12 in VDK plugins and main components.

What's Changed

Full Changelog: v1.6...v1.7

v1.6

31 Jan 15:04
f1c24b5
Compare
Choose a tag to compare

Major features include:

vdk-oracle database plugin

A new oracle plugin can be used to execute queries against Oracle DB in both thick and thin mode.
Ingesting data is now supported including with automatic shema inference.

To see more information check the vdk-oracle plugin documentation

vdk-structlog

Various enhancements in VDK-Structlog, including syslog handler support, log level parsing, and configuration updates

Check out more about vdk-structlog in its documentation

VDK Ingestion into Vector Database for RAG initiative started

With the rise in popularity of LLMs and RAG we see VDK as a core component to getting the data where we need it to be. VDK's strengths are ETL tasks. We see that its very well suited to populating the databases needed for RAG.

For more information check out the VEP

What's Changed

New Contributors

Full Changelog: v1.5...v1.6

Versatile Data Kit 1.5

06 Dec 13:47
ccfca15
Compare
Choose a tag to compare

Major features include:

Control Service

Data Job Configuration Persistence feature improvements

Adding the next level improvement over the pre-alpha version of the feature, including: GraphQL read data from DB, documentation improvements and improved test coverage.

vdk-structlog: Log Plugin

Adding improvements for the VDK Structs logs plugin and preparation for final release.

vdk-datasources: Data sources POC

Adding Data sources initial PoC version which includes:

  • Data Source APIs handling sources, streams and state
  • New Data Source is implemented by implementing IDataSource, IDataSourceConfiguraiton and IDataSourceStream
  • Data Source connection management partialy
  • Data Source Ingester that reads from data sources and writes to existing IIngeser
  • An example data source AutoGeneratedDataSource
  • An example job in the function test suite

vdk-oracle: Create oracle plugin

Adding pre-alpha VDK support for connecting and ingesting to an Oracle DB. For further usage details consult the VDK Oracle Plugin readme.

vdk-jupyter: Add alpha support for Jupyter Nodebooks

Adding full alpha support for VDK Jupyter integration.
How to get started?
We have prepared a few guides How to Create a Data Job With VDK Notebook, How To Develop a Data Job With VDK Notebook,
How to Convert a Data Job with VDK Notebook and How to Deploy a Data Job with VDK Notebook to help with your Jupyter journey.

What's Changed

Full Changelog: v1.4...v1.5

Versatile Data Kit 1.4

25 Oct 14:02
25417af
Compare
Choose a tag to compare

Major features include:

Control Service

Complete Data Job Configuration Persistence (Pre-alpha)

The current two-step process of storing data job deployment configurations in both Kubernetes and a database leads to performance degradation, potential data loss, and complexity; optimizing storage by consistently keeping all essential properties in the database can enhance efficiency, system reliability, and user experience

Another important benefit would be to allow to track deployment status using the API.

vdk-structlog log plugin

The plugin allows users to configure logging metadata and logging format. It also works with bound loggers.

This plugin allows users to:

select the log output format
configure the logging metadata
display metadata added by bound loggers

See more in its documentation page

vdk-core Error handling changes

Deprecated error reporting patterns

Most vdk-core generic Exceptions replaced with Domain specific

Test exception propagation to user code

VDK stopped wrapping non-vdk errors in vdk errors. This should result in errors coming from libraries, templates, etc. being propagated to user code. Users should then be able to handle those errors. So now something like this should be easy:

def run(job_input: IJobInput):
    args = dict()
    try:
        job_input.execute_template("csv-risky", args)
    except pd.errors.EmptyDataError as e:
        log.info("Handling empty data error")
        log.exception(e)

What's Changed

New Contributors

Full Changelog: v1.3...v1.4

Versatile Data Kit 1.3

27 Sep 12:59
d12b83e
Compare
Choose a tag to compare

Major features include:

VDK SDK

Add vdk sql-query command (experimental)

CLI command to execute SQL query against VDK managed database.

It should replace vdk <db>-query commands.

export db_default_type=trino
vdk sql-query -q "select * from trino_table" 
id                                  memory_size_mb    num_vcpus
-------------- ----------------  -----------
50181506DB2F7               256            1
5018A2223FC32               128            1
501883404870A               256            1


vdk sql-query -o json -q "select * from trino_table" 
[
 {"id": "50181506DB2F7", "memory_size_mb": 256, "num_vcpus": 1}, 
 {"id": "5018A2223FC32", "memory_size_mb": 128, "num_vcpus": 1},
 {"id": "501883404870A", "memory_size_mb": 256, "num_vcpus": 1}
]

VDK Notebook Getting Started

Introduction to the development of VDK Jobs using Notebooks.

  • Learn how to create data jobs
  • Learn how to deploy data jobs

See more in https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-plugins/vdk-jupyter/getting-started.ipynb

VDK SQL Notebook Cell

image

VDK Errors APIs

(Relevant for plugin developers)
VDK is deprecating the user of errors.log_and_rethrow and errors.log_and_throw in favour of

errors.report(error_type, exception: BaseException)
errors.report_and_throw(exception: BaseVdkException)
errors.report_and_rethrow(error_type, exception: BaseException)

The aim is to reduce "double" logging and verbosity of logs.

Control Service

Add support for multiple jwt issuers

New properly security.oauth2.jwtIssuerUris is introduced and replaced jwtIssuerUrl

security:
   oauth2:
      
      ## [ Required if security.enabled = True ]
      ## Deprecated in favor of jwtIssuerUris.
      jwtIssuerUrl: ""
      ## [Required if security.enabled = True]
      ## Comma separated list of issuers to use.
      jwtIssuerUris: ""

Implement Webhook APIs authentication

Enable setting system or service account for Control Service webhook authentication

This introduces 2 new properties for each webhook

      authorizationServerEndpoint: ""
      authorizationRefreshToken: ""

If set they will be used when making HTTP Webhook request. If not it will fall back to the user provided authentication token

What's Changed

Full Changelog: v1.2...v1.3

Versatile Data Kit 1.2

30 Aug 14:08
43f4cf8
Compare
Choose a tag to compare

Major features include:

New Github Landing Page

The new landing page of our open-source project. The new landing page aims to allow users to see and understand what is VDK and what they can do with VDK much easier by showing them.

Check it out at https://github.com/vmware/versatile-data-kit

Control Service improvements

Operators can set builder image per Python version

Operator can easily control the image of

  • The operator-managed VDK (system) library,
  • The base image used to build the user data job
  • And now the builder image with which the user data job is build
deploymentSupportedPythonVersions:
3.9:
     baseImage: "registry.hub.docker.com/versatiledatakit/data-job-base-python-3.7:latest"
     vdkImage: "registry.hub.docker.com/versatiledatakit/quickstart-vdk:release"
     builderImage: "registry.hub.docker.com/versatiledatakit/job-builder:latest"

More information can be found in the Control Service Helm Chart documentation

Operator can configured to automatically ignore files on deploy

When users deploy job operator can control which files are actually accepted and either return error or simply ignore them:
This allows much better security while also allowing flexibility of operators to change without impacting users directly:

# Instead to allow only sql and ini text files specify "text/x-sql,text/x-ini"
# Full list of file types are documented in https://tika.apache.org
# If set to empty, then all file types are allowed.
uploadValidationFileTypesAllowList: ""

# List of file extensions that are allowed to be uploaded. Comma separated list e.g: "py,csv,sql"
# only files with extensions that are present in this list will be allowed to be uploaded.
# if the list is empty all extensions are allowed.
uploadValidationFileExtensionsAllowList: ""

# Works as the uploadValidationFileTypesAllowList above, only it deletes the files instead of failing
# the job upload. Runs before the allow list, therefore if only files of the same types are present in
# both lists, job upload will succeed.
uploadValidationFileTypesFilterList: ""

# List of file extensions that are automatically deleted from data job source code before upload.
# Comma separated list e.g: "pyc,exe,sh". If the list is empty no files will be deleted.
# Files are first deleted before the allow list performs its checks.
uploadValidationFileExtensionsFilterList: ""

More information can be found in the Control Service Helm Chart documentation

New initiative: VDK Run Logs: Simplified And Readable

Take a look at the VEP which would simplify troubleshooting and development using VDK .

We are focused on those goals:

  • Data job run logs provide progress-tracking information
  • User logs stand out
  • Long-running operations (like DAGs) are traceable in the logs
  • The root cause is immediately visible from the logs.
  • Clean Error Handling

Versatile Data Kit Architecture.md

Design architecture of Versatile Data Kit outlining all main interfaces and how they work can be seen at architecture.md

Notebook UI improvements

Add UI element indicating a VDK operation is running

Provides visual feedback to the user when a VDK operation is in progress.

Status button Hover

Add icons to vdk operation result dialogs

Enhances user experience by adding icons to result dialog boxes

Screenshot 2023-08-09 at 12 54 00 Screenshot 2023-08-09 at 12 53 49 Screenshot 2023-08-09 at 12 53 41

VDK Login UI: Semi-automated authentication workflow in the Jupyter Notebook

New database POC plugin vdk-duckdb

Check out more at vkd-duckdb

What's Changed

Read more

v1.0.1

26 Jul 14:03
64c8696
Compare
Choose a tag to compare

Major features include:

Secrets Service Helm Chart installation

Vault integration configuration for storing Data Job Secrets has been added to the Helm chart:

secrets:
    vault:
        enabled: false
        uri: "http://localhost:8200"
        
        externalSecretName: ""
        ## Alternatively provide the uri and Approle Settings here. externalSecretName takes precedence if both are set.
        approle:
            roleid: foo
            secretid: foo

        sizeLimitBytes: "1048576"

VDK Secrets CLI

Job secrets are used to store credentials/tokens/sensitive data securely. They can be updated using vdk-control-cli now:

Install vdk-control-cli if needed (it comes pre-installed in quickstart-vdk)

pip install vdk-control-cli 
vdk secrets --help

For example:

     # Set single secret with key "my-key" and value "my-value". If no value is passed you'll get prompted so it's not printed on the screen.
     vdk secrets --set my-key "my-value"

     # Update multiple secrets at once.
     vdk secrets --set "key1" "value1" --set "key2" "value2" --set "secret1" --set "secret2"

Convert Directory-style To Notebook-style Data Job

With the introduction of Notebook-style data jobs, the user has the option to Convert Directory-style to Notebook-style Data Job.

VDK MenuVDK Menu
Pop-up windowPop-up window
NotebookThe first part of the job showing description and instruction on the conversionShowing how each file step was converted

VDK Jupyter Extension published in PyPi

Users can now install the Jupyter extension with VDK in their own Python and jupyter environment with a single line :

pip install vdk-jupyterlab-extension

Then start Jupyter lab as usual:

jupyter lab

Users can now see the notebook:

NotebookJupyter lab showing VDK menu

New plugin: vdk-smarter

VDK Smarter introduces proof of concept (pre-alpha) integration with OpenAI.

In the POC it does a review of all SQL queries managed by VDK.

For more details see the plugin home page

What's Changed

  • control-service: Add helm chart entries for Vault Configuation by @dakodakov in #2418
  • control-service: Update contributing.md with correct java requirements by @danail-georgiev in #2430
  • control-service: add configurable smtp host property by @mrMoZ1 in #2411
  • control-service: add helm template for alertmanager by @mrMoZ1 in #2326
  • control-service: add timestamps to helm chart by @DeltaMichael in #2344
  • control-service: better error logging for failed test by @murphp15 in #2374
  • control-service: fix helm chart by @dakodakov in #2449
  • control-service: fix publish-job-base-image script by @mivanov1988 in #2473
  • control-service: fix typo in helm chart read only root filesystem property by @mrMoZ1 in #2476
  • control-service: install necessary dependencies to job builder secure by @mivanov1988 in #2472
  • control-service: job-builder using kaniko fix by @tozka in #2429
  • control-service: job-builder-secure using kaniko fix by @tozka in #2447
  • control-service: logs endpoint doesn't hang by @murphp15 in #2370
  • control-service: prevent integer translation in helm chart by @dakodakov in #2470
  • control-service: push to multiple registries by @tozka in #2381
  • control-service: release job builder in 2 repos by @tozka in #2413
  • control-service: remove default vault token by @dakodakov in #2475
  • control-service: remove unused dependency influxdb by @tozka in #2388
  • control-service: run integration tests on multiple namespace. by @murphp15 in #2446
  • control-service: set Execution and JobQuery APIs to stable by @tozka in #2417
  • control-service: split build job base image CI/CD step by @mivanov1988 in #2348
  • control-service: switch to Approle Vault authentication by @dakodakov in #2435
  • control-service: use full url for heartbeat tests and heartbeat tests run in multiple namespaces by @murphp15 in #2295
  • frontend: Fix navigation in Data Jobs by @gorankokin in #2356
  • frontend: Fix router event handling in base class by @gorankokin in #2375
  • frontend: bump toolchain versions in frontend build docker image by @DeltaMichael in #2358
  • frontend: enable stable tagging by @DeltaMichael in #2378
  • frontend: fix data-pipelines build scripts by @DeltaMichael in #2389
  • frontend: push docker images to both repos by @tozka in #2390
  • frontend: quickstart-vdk operability tests using cypress by @DeltaMichael in #2359
  • frontend: remove e2e tests restrictions by @DeltaMichael in #2386
  • support: slack notification on pipeline failure by @DeltaMichael in #2338
  • vdk-control-cli: add vdk secrets command by @dakodakov in #2342
  • vdk-control-cli: add vdk secrets command by @dakodakov in #2357
  • vdk-control-cli: remove set-secret for properties by @dakodakov in #2409
  • vdk-core: Allow different python versions for vdk docker images by @doks5 in #2346
  • vdk-core: Set sender when checking if email exists by @doks5 in #2376
  • vdk-core: [Hot Fix] Stop throwing exceptions if config.ini not present by @doks5 in #2367
  • vdk-heartbeat: cover requirements.txt automatic installs by @tozka in #2393
  • vdk-impala: Truncate table before inserting data by @sbuldeev in #2369
  • vdk-impala: Update README.md for vdk-impala by @sbuldeev in #2355
  • vdk-impala: support also pydantic 1.0 by @tozka in #2368
  • vdk-impala: upgrade code to support pydantic 2.0 by @tozka in #2362
  • vdk-ipython: README.md fix by @duyguHsnHsn in #2345
  • vdk-jupyter: fix server error in jupyter ui and remove unneeded code by @duyguHsnHsn in #2361
  • vdk-jupyter: Add a message describing how to contact the Jupyter devs by @gageorgiev in #2414
  • vdk-jupyter: Create init cell when opening new notebook by @gageorgiev in #2352
  • vdk-jupyter: Sample job notebook step by @gageorgiev in #2364
  • vdk-jupyter: add Convert Job To Notebook UI button by @yonitoo in #2329
  • vdk-jupyter: convert job operation by @duyguHsnHsn in #2406
  • vdk-jupyter: publish image to pip registry by @murphp15 in #2407
  • vdk-jupyter: remove delete operation by @duyguHsnHsn in #2428
  • vdk-plugin-control-cli: add secrets command by @dakodakov in #2387
  • vdk-plugins: fix build of multiple plugins by @tozka in #2445
  • vdk-plugins: include Ingestion hooks documentation by @tozka in h...
Read more

Versatile Data Kit 1.0

28 Jun 14:00
0918a53
Compare
Choose a tag to compare

Major features include:

VDK Operations UI

VDK Operations UI is a browser application that allows users to manage and monitor data jobs. It ships as part of quickstart-vdk and is available to users who run quickstart-vdk locally.

Users can now:

  • View the overall health of their data jobs
  • Enable/disable/re-run data jobs
  • Have a list of their data jobs and view their deployment status, latest execution status, success rate, etc.
  • Have easy access to individual data job details, such as description, schedule, notifications, and data job source code
  • View details for each execution of a data job, e.g. the number of executions, job versions for each execution, execution duration, etc.

For more information about the architecture, check out VEP-1507.

See the UI in action:

Control Service Secrets API

With the release of Secrets API, users can now securely store sensitive data such as passwords, credentials, tokens, ensuring compliance with industry standards and reducing the risk of unauthorized access and data breaches.

The new Secrets API allows users to configure a Vault instance in the Control Service, enabling the storage and retrieval of secrets for data jobs. Data jobs can now easily set and retrieve secrets during runtime, enhancing security and enabling seamless integration with third-party systems.

To store and retrieve secrets, we have introduced new API methods under the path

/data-jobs/for-team/{team_name}/jobs/{job_name}/deployments/{deployment_id}/secrets

Users can make GET requests to retrieve secrets and PUT requests to update secrets for a specific data job deployment.

For more details on API usage and examples, please refer to our documentation.

vdk-impala: Introduce checks for snapshot and insert template

With the introduction of snapshot and insert template checks, we can now ensure the quality and correctness of the data before it is inserted into the target table.

Previously, the processing step checks were unable to validate the semantics of the data, potentially allowing erroneous data to be inserted. With the new checks in place, we have better control over the data integrity and can prevent unwanted behavior.

Here's an example of how to use the checks:

    def sample_check(tmp_table_name):
        return False if "bad" in tmp_table_name else True 

    template_args["check"] = sample_check 
    job_input.execute_template(
        template_name="snapshot",
        template_args=template_args,
    )

What's Changed

  • control-service: better error logging allowing to understand failing test by @murphp15 in #2184
  • control-service: Python image based on Photon OS by @mivanov1988 in #2243
  • control-service: ability to send authenticated email notifications by @mrMoZ1 in #2294
  • control-service: add secrets API by @dakodakov in #2171
  • control-service: add tmp dir path to image deployer's env variables by @mrMoZ1 in #2244
  • control-service: data jobs points to correct namespace by @murphp15 in #2268
  • control-service: fix failing pipelines by @murphp15 in #2296
  • control-service: infer correct namespace if not set by @tozka in #2277
  • control-service: install kubectl by @murphp15 in #2290
  • control-service: introduce latest and stable tags for docker images by @DeltaMichael in #2138
  • control-service: make kubernetes service easy to test. by @murphp15 in #2249
  • control-service: move cron jobs methods to the data jobs class by @murphp15 in #2291
  • control-service: move cron jobs methods to the data jobs class by @murphp15 in #2293
  • control-service: multiple namespaces in testing by @murphp15 in #2269
  • control-service: produce secure base job images for python 3.8-3.11 by @mivanov1988 in #2208
  • control-service: remove spammy logs by @tozka in #2278
  • control-service: remove unneeded methods by @murphp15 in #2260
  • control-service: remove unused properties by @murphp15 in #2262
  • control-service: secrets service implementation by @dakodakov in #2241
  • control-service: secrets service integration test by @dakodakov in #2289
  • control-service: secrets service unit tests by @dakodakov in #2276
  • control-service: use real class when testing instead of mock by @murphp15 in #2261
  • examples: Add Supported Python Versions Example by @doks5 in #2288
  • frontend: add null checks for optional configs by @DeltaMichael in #2193
  • frontend: disable stable tagging for ui docker images by @DeltaMichael in #2240
  • frontend: ping frontend on docker image release by @DeltaMichael in #2101
  • specs: VEP-2272 Complete Data Job Configuration Persistence Part 2 by @mivanov1988 in #2302
  • specs: VEP-2272 Complete Data Job Configuration Persistence by @mivanov1988 in #2287
  • vdk-control-cli: Allow extensions to specify a sample job by @gageorgiev in #2177
  • vdk-control-cli: Test only on 3.7 and 3.11 by @gageorgiev in #2230
  • vdk-core: Accept string as job_path in JobConfig by @doks5 in #2251
  • vdk-core: Add python version disparity warning by @doks5 in #2242
  • vdk-core: Add python_version configuration to config-help by @doks5 in #2271
  • vdk-core: Improve log message for python version disparity by @doks5 in #2250
  • vdk-core: Update JobConfig to match vdk-control-cli JobConfig by @doks5 in #2226
  • vdk-core: adapt to recent pluggy changes by @dakodakov in #2317
  • vdk-core: add configurable write directory value by @mrMoZ1 in #2206
  • vdk-core: add vdk sdk secrets api - part I by @dakodakov in #2318
  • vdk-core: add vdk sdk secrets api - part III by @dakodakov in #2325
  • vdk-impala: Introduce checks for insert template by @sbuldeev in #2198
  • vdk-impala: Introduce checks for snapshot template by @sbuldeev in #2040
  • vdk-jupyter: Allow for creating a job with a notebook step by @gageorgiev in #2172
  • vdk-jupyter: Fix job creation by @gageorgiev in #2245
  • vdk-jupyter: fix build by pinning every package to a specific version by @duyguHsnHsn in #2186
  • vdk-jupyter: installation and build by @duyguHsnHsn in #2319
  • vdk-jupyter: pin jupyterlab to 3.6.3 in pyproject.toml by @duyguHsnHsn in #2292
  • vdk-jupyter: pin tsc to specific version by @duyguHsnHsn in #2220
  • vdk-jupyter: small fixtures on the ui by @duyguHsnHsn in #2161
  • vdk-notebook: handle job with mixed .ipynb, .py, .sql files use-case by @duyguHsnHsn in #2279
  • vdk-plugin-control-cli: better error logging by @murphp15 in #2185
  • vdk-test-utils: add vdk sdk secrets api - part 2 by @dakodakov in #2320
  • versatile-data-kit: Update .gitlint by @tozka in #2266
  • versatile-data-kit: add pr title checker by @tozka in #2270
  • versatile-data-kit: ignore patch updates in dependabot by @tozka in #2328

Full Changelog: v0.14...v1.0

Versatile Data Kit 0.14

31 May 14:17
4220970
Compare
Choose a tag to compare

Major features include:

VDK DAG plugin release

VDK DAG (previously vdk-meta-jobs) is the official name of the plugin allowing users to express dependencies between data jobs and is released as Beta with more stability and usability and documentation improvements.

Check out for more in the plugin page.

Versatile Data Kit UI Shareable Web links

Now users can share links with filters applied:

  • Data Jobs list (Manage and Explore screen) are shareable through URL, as every applied filter is persisted to URL and vice-versa
  • Data Job Executions screen filters and sort parameters are shareable through URL, as every applied filter or sort is persisted to URL and vice-versa

VDK UI configuration improvements and easy to get started by using quickstart-vdk

Users can now access VDK UI using quickstart-vdk. VDK UI is made to be much more configurable:

  • Toggleable authentication (default: enabled) using the 'skipAuth' flag.
  • Configuration of authentication parameters.
  • Ability to specify visual elements displayed, e.g., navigation button to the Explore page.

VDK Control CLI supports python version

People now can specify the python version they need their job to run when deployed in VDK Control Service runtime:

vdk deploy --python-version 3.7 ..

Or in job config.ini

[job]
python_version = 3.7

Users can also see which version of python is VDK Control Service supporting currently:

vdk info

would return something like

Getting control service information...
VDK Control service version: PipelinesControlService/0.0.1-SNAPSHOT/5f078fe ...
Supported python versions:
3.9
3.8

What's Changed

  • control-service: Clean up old data job configurations by @doks5 in #2075
  • control-service: Fix backwards-compatibility issues by @doks5 in #2022
  • control-service: Only CLI executions are "Manual" by @gageorgiev in #1763
  • control-service: Rework supported python version logic by @doks5 in #1992
  • control-service: Swagger UI quickstart-vdk server config by @ivakoleva in #2062
  • control-service: [Bug fix] Fix supported python versions helm configuration by @doks5 in #1964
  • control-service: a clear error message on how to handle the failed pipeline by @murphp15 in #2127
  • control-service: add ability to check if docker image exists in ecr by @mrMoZ1 in #1977
  • control-service: allow more time to reach a complete state by @murphp15 in #2143
  • control-service: append integration test name to job name by @mivanov1988 in #2093
  • control-service: better error logging and pull private image in private test by @murphp15 in #2156
  • control-service: better error message by @murphp15 in #2094
  • control-service: better error message from throwable by @murphp15 in #2157
  • control-service: clarify build steps by @dakodakov in #1959
  • control-service: code expected to run in transaction now runs in transaction by @murphp15 in #2117
  • control-service: custom app config values can we passed to helm. by @murphp15 in #2004
  • control-service: delete unused method by @murphp15 in #2038
  • control-service: disable authorization on test/cicd deployment by @tozka in #2129
  • control-service: disable failing test by @murphp15 in #2086
  • control-service: fail tests fast by @murphp15 in #2137
  • control-service: fix api declaration by @murphp15 in #1974
  • control-service: fix oom tests by @murphp15 in #2028
  • control-service: handle null started by value by @murphp15 in #2151
  • control-service: if a test is in a bad state it fails straight away by @murphp15 in #2098
  • control-service: include details in error message by @murphp15 in #2122
  • control-service: increase CICD deployment resources by @tozka in #2130
  • control-service: killed job was shown as successful by @mivanov1988 in #2116
  • control-service: latest version of gradle and spring /remove old comment by @murphp15 in #1976
  • control-service: logs url can include team name by @murphp15 in #2013
  • control-service: new python client. by @murphp15 in #1983
  • control-service: print response body on error by @murphp15 in #2113
  • control-service: remove a test that is testing behaviour that doesn't exist by @murphp15 in #2031
  • control-service: remove unused parameter by @murphp15 in #2027
  • control-service: remove unused parameters by @murphp15 in #2016
  • control-service: see more details when there is an error by @murphp15 in #2050
  • control-service: update ecr credentials integration test by @mrMoZ1 in #2079
  • control-service: upgrade python client by @murphp15 in #2076
  • control-service: use git for images by @murphp15 in #2097
  • docs: add getting started section for quickstart-vdk and ui by @DeltaMichael in #2019
  • frontend: Bugfix in e2e plugins function and bump major versions for UI libs by @gorankokin in #1994
  • frontend: Fix for e2e tests by @gorankokin in #2030
  • frontend: Implement executions list enhacements by @gorankokin in #2126
  • frontend: Improve visibility of 'User error' messages by @hzhristova in #1960
  • frontend: Job sharable executions filter and sort by @gorankokin in #2072
  • frontend: Toggleable auth by @ivakoleva in #1958
  • frontend: Upgrade lineage to beta version by @hzhristova in #1991
  • frontend: shareable links with query params for Data Jobs grids by @hzhristova in #2049
  • frontend: visibility of app components is configurable by @DeltaMichael in #1978
  • quickstart-vdk: ignore explore page and widgets in frontend by @DeltaMichael in #2073
  • vdk-airflow: fix failing tests by @murphp15 in #2078
  • vdk-control-cli: Add support for python_version by @doks5 in #2002
  • vdk-control-cli: Add support for python_version in config by @doks5 in #2023
  • vdk-control-cli: add vdk info command to list of cli commands by @dakodakov in #2069
  • vdk-control-cli: import the latest version of the client into cli by @murphp15 in #1969
  • vdk-control-cli: upgrade python client by @murphp15 in #2077
  • vdk-control-cli: use explicit parameter names by @murphp15 in #1975
  • vdk-dag: DAGs propagate their execution type to their component jobs by @yonitoo in #2080
  • vdk-dag: Drop deprecation warnings by @gageorgiev in #2012
  • vdk-dag: Fix config bug by @gageorgiev in #2029
  • vdk-dag: Rename vdk-meta-jobs to vdk-dag by @gageorgiev in #1831
  • vdk-dag: fix plugin name of DAGs example README.md by @yonitoo in #1945
  • vdk-dag: improve DAGs docs and example by @yonitoo in #1984
  • vdk-dag: update VEP about the execution type propagation by @yonitoo in #2095
  • vdk-examples: Change Meta Jobs to DAGs in examples by @gageorgiev in #2024
  • vdk-gdp-execution-id: example added by @ivakoleva in #1962
  • vdk-heartbeat: add t...
Read more

Versatile Data Kit 0.13

26 Apr 13:55
b2a4049
Compare
Choose a tag to compare

Major features include:

New plugin: vdk-gdp-execution-id

An installed Generative Data Pack plugin automatically expands the data sent for ingestion.

This GDP plugin detects the execution ID of a Data Job running, and decorates your data product with it. So that,
it is now possible to correlate a data record with a particular ingestion Data Job execution ID.

For more information see the plugin documentation

vdk-dag: pass arguments to jobs in a DAG

Now each job in a DAG can be passed arguments :

{
"job_name": "name-of-job",
"team_name": "team-of-job",
"fail_meta_job_on_error": false,
"arguments": <ARGUMENTS IN DICTIONARY FORMAT HERE>,
"depends_on": ["name-of-job1", "name-of-job2"]
}

vdk-notebook: VDK job input in vdk cells

Users will be able to develop jobs entirely in a Notebook file with all features of VDK available out of the box
After installation of vdk-notebook users can now will have access to job_input interface to execute templates, ingest data and all else.

image

vdk-notebook: vdk and non-vdk cells

To enable separation of product and development code vdk-notebook integration provides a way for users to set which cells are deployable and part of their production code and which are not.

image

quickstart-vdk now includes the Operations UI

When installing quickstart-vdk VDK Server is available for local testing and now includes UI:

pip install quickstart-vdk
vdk server --install 

For more information see here

Versatile Data Kit Frontend npm libraries release

The Versatile Data Kit Frontend provides 2 npm (angular) libraries which can be used to build integrate VDK UI with your own screens:

  • @versatiledatakit/data-pipelines
    Versatile Data Kit Data Pipelines library provides UI screens that helps to manage data jobs via Versatile Data Kit Control Service
  • @versatiledatakit/shared
    Versatile Data Kit Shared library enables reusability of shared features like: NgRx Redux, Error Handlers, Utils, Generic Components, etc.

What's Changed

Read more