feature: Partial ("Preview") Syncs #8080

anden-akkio · 2023-08-30T19:32:50Z

Feature scope

API

Description

At Akkio, we have a use case where we want to fetch a limited subset of user-imported datasets. We do this so that we can show users a small preview of what their end data might look like after what's essentially an internal transformation process we run, and we want them to be able to potentially make changes and make sure that transformation process looks good before they commit to the full data sync.

After a brief discussion with @DouweM, this feature doesn't yet seem to exist on Meltano, but is something that would potentially fall under Meltano's purview. The current state of Meltano essentially forces you to do the full data sync before you can actually show it to the user, resulting in subpar experience if the end output isn't really what they want. While split-second queries aren't what Meltano is built for, nor are what I'm really proposing here, allowing for a preview window of sorts still seems like a positive addition cohesive with the rest of the tool to me.

Dropping some other comments that were made, for full context:

One option would be to add a limit config on the Tap SDK to only load a certain number of records, and another to add a mode to meltano elt to stop after the first state message is output by the target. (Targets flush records and write state messages at different batch sizes (often configurable), but until the first state message comes we can’t assume any of the records have made it to the destination, so we couldn’t base this on number-of-records on the target or meltano elt side, that’d need to be controlled by the target.)

tayloramurphy · 2023-08-30T20:47:47Z

We do have a meltano test that basically invokes the tap and then kills it after 1 RECORD comes through. Perhaps a flag to configure the number of records to allow would work?

tayloramurphy · 2023-09-06T03:30:29Z

@edgarrmondragon how hard would it be to make the number of records on the test command configurable?

edgarrmondragon · 2023-09-06T17:52:16Z

@edgarrmondragon how hard would it be to make the number of records on the test command configurable?

@tayloramurphy Well, it uses plugin commands and reports the summary of results if multiple test commands were selected, similar to pytest. Problem is, it's not working at the moment for taps because even after adding a test command:

plugins:
  extractors:
  - name: tap-stackexchange
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-stackexchange.git
    commands:
      test:
        args: --test

Meltano doesn't know that it also needs to pass other options, like --config, so it fails if config is required.

I've created an issue to address this: #8112.

Also, this command can't be used in a pipeline as is because it outputs its test summary information to stdout.

Note
The above changes to Core are not required if one simply uses meltano invoke tap-stackexchange --test.

Note
Even with those changes to Core, the tap would still only emit one record for each stream. To allow the user to increase the limit would require changes to the SDK, either at the CLI level (something like tap-stackexchange --test=100) or at the config level (like meltano/sdk#1366). The latter is slightly easier because the --test option is a bit weird already accepting a schema option, --test=schema.

anden-akkio added kind/Feature valuestream/Meltano labels Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Partial ("Preview") Syncs #8080

feature: Partial ("Preview") Syncs #8080

anden-akkio commented Aug 30, 2023 •

edited

tayloramurphy commented Aug 30, 2023

tayloramurphy commented Sep 6, 2023

edgarrmondragon commented Sep 6, 2023

feature: Partial ("Preview") Syncs #8080

feature: Partial ("Preview") Syncs #8080

Comments

anden-akkio commented Aug 30, 2023 • edited

Feature scope

Description

tayloramurphy commented Aug 30, 2023

tayloramurphy commented Sep 6, 2023

edgarrmondragon commented Sep 6, 2023

anden-akkio commented Aug 30, 2023 •

edited