Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Partial ("Preview") Syncs #8080

Open
anden-akkio opened this issue Aug 30, 2023 · 3 comments
Open

feature: Partial ("Preview") Syncs #8080

anden-akkio opened this issue Aug 30, 2023 · 3 comments

Comments

@anden-akkio
Copy link
Contributor

anden-akkio commented Aug 30, 2023

Feature scope

API

Description

At Akkio, we have a use case where we want to fetch a limited subset of user-imported datasets. We do this so that we can show users a small preview of what their end data might look like after what's essentially an internal transformation process we run, and we want them to be able to potentially make changes and make sure that transformation process looks good before they commit to the full data sync.

After a brief discussion with @DouweM, this feature doesn't yet seem to exist on Meltano, but is something that would potentially fall under Meltano's purview. The current state of Meltano essentially forces you to do the full data sync before you can actually show it to the user, resulting in subpar experience if the end output isn't really what they want. While split-second queries aren't what Meltano is built for, nor are what I'm really proposing here, allowing for a preview window of sorts still seems like a positive addition cohesive with the rest of the tool to me.

Dropping some other comments that were made, for full context:

One option would be to add a limit config on the Tap SDK to only load a certain number of records, and another to add a mode to meltano elt to stop after the first state message is output by the target. (Targets flush records and write state messages at different batch sizes (often configurable), but until the first state message comes we can’t assume any of the records have made it to the destination, so we couldn’t base this on number-of-records on the target or meltano elt side, that’d need to be controlled by the target.)

@tayloramurphy
Copy link
Collaborator

We do have a meltano test that basically invokes the tap and then kills it after 1 RECORD comes through. Perhaps a flag to configure the number of records to allow would work?

@tayloramurphy
Copy link
Collaborator

@edgarrmondragon how hard would it be to make the number of records on the test command configurable?

@edgarrmondragon
Copy link
Collaborator

@edgarrmondragon how hard would it be to make the number of records on the test command configurable?

@tayloramurphy Well, it uses plugin commands and reports the summary of results if multiple test commands were selected, similar to pytest. Problem is, it's not working at the moment for taps because even after adding a test command:

plugins:
  extractors:
  - name: tap-stackexchange
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-stackexchange.git
    commands:
      test:
        args: --test

Meltano doesn't know that it also needs to pass other options, like --config, so it fails if config is required.

I've created an issue to address this: #8112.

Also, this command can't be used in a pipeline as is because it outputs its test summary information to stdout.

Note
The above changes to Core are not required if one simply uses meltano invoke tap-stackexchange --test.

Note
Even with those changes to Core, the tap would still only emit one record for each stream. To allow the user to increase the limit would require changes to the SDK, either at the CLI level (something like tap-stackexchange --test=100) or at the config level (like meltano/sdk#1366). The latter is slightly easier because the --test option is a bit weird already accepting a schema option, --test=schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants