Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] CLI Parameter for packages-install-path #9932

Open
3 tasks done
stevenayers opened this issue Apr 13, 2024 · 2 comments · May be fixed by #9933
Open
3 tasks done

[Feature] CLI Parameter for packages-install-path #9932

stevenayers opened this issue Apr 13, 2024 · 2 comments · May be fixed by #9933
Labels
enhancement New feature or request triage

Comments

@stevenayers
Copy link

stevenayers commented Apr 13, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Add a CLI parameter for the packages-install-path, similar to how target-path has one.

In the docs, under target-path, it says:

Just like other global configs, it is possible to override these values for your environment or invocation by using the CLI option (--target-path) or environment variables (DBT_TARGET_PATH).

Describe alternatives you've considered

Using the env var DBT_PACKAGES_INSTALL_PATH.

The issue here is that some orchestration tools, such as Databricks DBT Workflows make setting environment variables very difficult. By adding this cli parameter, we maintain consistency across global configs.

Who will this benefit?

People using orchestration tools with awkward limitations.

Are you interested in contributing this feature?

Yes, the PR is #9933

@stevenayers stevenayers added enhancement New feature or request triage labels Apr 13, 2024
@stevenayers stevenayers linked a pull request Apr 13, 2024 that will close this issue
5 tasks
@dbeatty10
Copy link
Contributor

Thanks for opening this @stevenayers !

Can you share more about the specific use cases where combining a CLI flag with an environment variable is necessary or beneficial versus just merely including the packages-install-path configuration in dbt_project.yml?

@stevenayers
Copy link
Author

Hi @dbeatty10, sure no problem! Let me break this down a bit.

Hardcoding packages-install-path

1. In scenarios when docker containers are being used this can raise difficulties. I won't go into too much detail because it's been documented quite well in this issue #1710.

2. When you are dealing with a lot of orchestration/workflow systems you will often find that the working directory of each step does not share the same working directory as the previous, and they can often be dynamic. Take this pipeline as an example:

  graph LR;
      A[dbt debug]-->B[dbt run];
      B-->C[dbt test];
      C-->D[dbt docs generate];

Each working directory could look something like /tmp/job-id/step-id

  • dbt debug: /tmp/1ad0ceb/ee74a60082b34c3a3d0df8a0d5d5cbfd7ec5ed6a
  • dbt run: /tmp/1ad0ceb/607646b627e80fe5e45545589fc8c09482010978
  • dbt run: /tmp/1ad0ceb/7e164e3ab723c357cb638ad6c1e1beef19a7fec6
  • dbt test: /tmp/1ad0ceb/cb56f4fdc16d5a79953af3003645a1af5a000926

With this, you don't want to be re-installing your deps at every stage, and likely want to reuse them. This is where, like in issue #1710, you will want to use an environment variable like:

config-version: 2
packages-install-path: "{{ env_var('DBT_PACKAGES_INSTALL_PATH', 'dbt_packages') }}"

You could set packages-install-path: "../dbt_packages", but that's making assumptions when you sometimes need to use shell script logic to figure out what that directory path needs to be.

3. Say you have set packages-install-path to /tmp/my_custom_packages_path so it can be shared between steps. What if you're also running your CI/CD test pipeline in that environment?

Your packages.yml is changed in your feature branch, which updates the package contents in /tmp/my_custom_packages_path. Your live data pipeline is in the middle of running, and when it goes to run, it fails because your feature branch has removed packages your live data pipeline was using when it was running.

This is where you'll want to do something like:

config-version: 2
packages-install-path: "{{ env_var('DBT_PACKAGES_INSTALL_PATH', 'dbt_packages') }}"

and in your pipeline you'll want to set DBT_PACKAGES_INSTALL_PATH to something like /tmp/${ENVIRONMENT}/dbt_packages.

Flag vs env var for packages-install-path

As I mentioned in the original issue, sometimes setting an environment variable can be a pain in some workflow systems. This also isn't very consistent or clean:
DBT_PACKAGES_INSTALL_PATH=/tmp/${ENVIRONMENT}/dbt_packages dbt run --target-path /tmp/${ENVIRONMENT}/target

You're setting config paths via two different methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants