What is the best way to run a Meltano pipeline in 1000 variations? #7257

aaronsteers · 2023-01-31T20:35:09Z

aaronsteers
Jan 31, 2023

First challenge - how to invoke the same pipeline with 100 or 1000 different client connection strings or account numbers

The standard way to implement this as of today would be to declare a state-id-suffix in your environment definition, and use a unique string for each customer or each variation of the pipeline. This ensures that each variation will have its own incremental state tracking.

From one layer above where you would invoke Meltano (for instance, in an Orchestration tool like Airflow, GitHub Actions, etc.), you would then declare a matrix of invocation parameters then make simply invoke Meltano up to the total n number of configuration sets ("configsets").

Second challenge - how to store and retrieve config for those multiple variations

Meltano does not solve this as of now. The orchestrator would loop through each configset and invoke Meltano with the proper context and credentials declared in environment variables. Shared config values could be stored in the Meltano Environment's systemdb, or in the meltano.yml file, but Meltano itself does not handle multiple configsets per environment.

The use of the environment variable strict mode option is recommended whenever passing dynamic config parameters: https://docs.meltano.com/reference/settings#ffstrict_env_var_mode
Info on 'state-id-suffix' feature, which was created for this use case: https://docs.meltano.com/concepts/environments#state-id-suffix
Meltano support for OAuth authentication flows, getting and storing refresh tokens, etc.: Enable auth assistance for OAuth-based plugins #1637

mjsqu · 2023-10-04T09:43:05Z

mjsqu
Oct 4, 2023

This is probably a bit of an aside, given it's not a wholly meltano solution, but might be of interest.

I think we could do this in our environment as of now, we run meltano via AWS ECS. We already run many Airflow tasks using the same connections, but split across multiple tasks to sync different sets of tables/endpoints. To spin up lots of tasks, we could:

Set up an Airflow DAG which loops n times and sets up tasks running this operator:
https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/operators/ecs/index.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs/client/run_task.html#
The overrides parameter can be used to send different sets of environment variables for each task. The salient part of which is:

                'command': [
                    'string',
                ],
                'environment': [
                    {
                        'name': 'string',
                        'value': 'string'
                    },
                ]

Tee up each task in your DAG with different parameters, i.e. n different variations of these arguments:

                'command': [
                    'meltano', 'run', 'tap1', 'target1', ...
                ],
                'environment': [
                    {
                        'name': 'TAP1_HOST',
                        'value': 'tiger'
                    },
                    {
                         'name': 'TAP1_PORT',
                      .....
                ]

Shared credentials could be set in the meltano.yml that resides in the container image that runs these tasks, or pulled from AWS SSM using a tool like https://github.com/s7clarke10/pychamber.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the best way to run a Meltano pipeline in 1000 variations? #7257

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What is the best way to run a Meltano pipeline in 1000 variations? #7257

aaronsteers Jan 31, 2023

First challenge - how to invoke the same pipeline with 100 or 1000 different client connection strings or account numbers

Second challenge - how to store and retrieve config for those multiple variations

Related

Replies: 1 comment

mjsqu Oct 4, 2023

aaronsteers
Jan 31, 2023

mjsqu
Oct 4, 2023