Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a lightweight Namespace Management and KV Store #3609

Open
anna-geller opened this issue Apr 23, 2024 · 0 comments
Open

Add a lightweight Namespace Management and KV Store #3609

anna-geller opened this issue Apr 23, 2024 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@anna-geller
Copy link
Member

anna-geller commented Apr 23, 2024

Feature description

The key value store will be implemented on top of internal storage for the following reasons:

  1. Privacy: we want Kestra to never store users' private data. This means that all values will be stored in the user’s private cloud storage bucket, and the kestra's database only contains metadata about it, such as the key, file URI, any attached metadata about this object, TTL, creation date, last updated timestamp, etc.
  2. Ease of implementation/migration: users can easily switch from open-source to cloud/EE because the implementation and data storage will be the same regardless of whether Kestra runs on top of Kafka or JDBC.

Keys and values

Keys are arbitrary strings (they can be all-caps, or mixed = up to the user; the length of each key should follow the same what we use for Secrets).

Values are JSON strings serialized as JSON files in internal storage.

Namespace binding

Key value pairs are tied to a namespace. There is a short-term and long-term implementation to enable namespace-level KV Store.

Short-term

In the short term, all key-value pairs will be tied to the current namespace. This means that:

  • when you create new KV pairs or overwrite existing ones using the kv.Set task, they will be tied to the namespace of the flow from which they were created
  • when you retrieve values by key using the kv.Get task, they will be retrieved from the current namespace.

Long term

In the long term, users should be able to create and read KV pairs across namespaces as long as they have the right permissions to access those namespaces. TL;DR this issue must be done first to implement the long-term solution https://github.com/kestra-io/kestra-ee/issues/1099.


Namespace management in OSS

To enable namespace binding in the OSS edition, we’ll introduce a lightweight version of Namespace Management (currently available only in EE) to the open-source edition, including the Overview tab and the KV Store: https://www.figma.com/file/ew0uXk0NRXJ2NBBJTNe2n1/UI?type=design&node-id=1465-18728&mode=design&t=TmrlE3Cx7ewyBHYl-0

Namespaces_OSS-restrictions

All other tabs will be greyed out with a hint that they are available on EE if the user needs them.

KV Store UI

The UI for KV will look a lot like the Secrets UI with a button at the top to create a new KV pair “New KV pair”

https://www.figma.com/file/ew0uXk0NRXJ2NBBJTNe2n1/UI?type=design&node-id=1456%3A18323&mode=design&t=Wm5ld1tT8VcTcGL9-1

image

You can Create, Read, Update or Delete KV pairs using:

  • the UI
  • the API
  • Terraform resource namespace_kv
  • via tasks.

KV Store core plugin

Set (or modify) a KV pair

id: set_kv
type: io.kestra.core.tasks.kv.Set
key: myvariable
value: "{{ {"myfile": outputs.download.uri} }}"
namespace: dev # the current namespace of the flow can be used by default
overwrite: true # whether to overwrite or fail if a value for that key already exists; default true
ttl: P30D # optional TTL 

Get a KV pair

The easiest way to retrieve a value by key is to use the Pebble function following this syntax:

{{ kv('VARIABLE_NAME', namespace_name, errorOnMissing_boolean) }}
# for example, to retrieve the previously create "myfile":
{{ kv('myvariable').myfile }} # assuming you retrieve it in a flow in the same namespace as the one for which key was created

If you prefer, you can also retrieve the value using a task:

id: get_value_by_key
type: io.kestra.core.tasks.kv.Get
key: myvariable
namespace: dev # the current namespace of the flow can be used by default
errorOnMissing: false # bool

And if you want to check if some values already exist for a given key, you can search keys by prefix:

id: get_keys_by_prefix
type: io.kestra.core.tasks.kv.GetKeys
prefix: "myvar"
namespace: dev # the current namespace of the flow can be used by default
errorOnMissing: false # bool

The output will be a list of keys—if no keys were found, an empty list will be returned.

Delete a KV pair

id: delete_kv_pair
type: io.kestra.core.tasks.kv.Delete
key: myvariable
namespace: dev # the current namespace of the flow can be used by default
errorOnMissing: false 

On EE, we need a dedicated permission (might be called KVSTORE) to allow fine-granular access to create, read, update or delete KV pairs on specific namespaces.


Extra notes

  1. Given that all values are stored in internal storage, no payload limit is required.
  2. The ttl will be lazily evaluated, i.e., only if the user tries to retrieve the value and the value is past its TTL, the key will be deleted, and we'll return null + a friendly message clarifying the expiration of the key.
  3. The Purge task cannot be used to purge old keys as Purge is tied to executions. We'll need to add a new task, e.g., PurgeKV, to support purging expired keys (or all keys past a certain creation date if needed).

Extra context

Why not just use State Store?

State Store ist challenging to use. Common issues include:

  1. Being able to see what values are persisted across flows and namespaces
  2. Being able to inspect those values from the UI (see the value for a given key)
  3. Being able to see when the key was initially created and the last time updated
  4. Being able to set the type for that saved value
  5. (TBD later scope) Potentially also being able to react to changes in the state store as a simple decoupling mechanism

Use cases

Use cases this will enable:

  1. Keep the last timestamp scraped from an API
  2. Keep the last message or file processed to easily determine whether some new processing should take place or not
  3. (TBD later scope) KV change generates an audit log — this will allow to e.g. take action whenever the value has changed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant