Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3319] [Bug] Running dbt.invoke(['deps']) changes the current working directory when using a DBT_PROJECT_DIR #8997

Open
2 tasks done
jelstongreen opened this issue Nov 3, 2023 · 6 comments · May be fixed by #9596
Open
2 tasks done
Labels
bug Something isn't working deps dbt's package manager file_system How dbt-core interoperates with file systems to read/write data help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors python_api Issues related to dbtRunner Python entry point

Comments

@jelstongreen
Copy link

jelstongreen commented Nov 3, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

We use a nested directory structure like:
projects/

  • project_a
  • project_b

When using the env var DBT_PROJECT_DIR and invoking dbt using python running deps the current working directory is modified to the be the DBT_PROJECT_DIR which causes issues for the remaining python script.

Expected Behavior

The working directory should not be affected by the invocation.

Steps To Reproduce

Use a multi project setup and invoke dbt deps using python whilst using the DBT_PROJECT_DIR env var.

Relevant log output

[MainThread] 2023-11-03 15:44:28,936 root         DEBUG    Current directory is /Users/josh.elston-green/repos/my_repo
15:44:28  Running with dbt=1.6.0
15:44:29  Installing calogica/dbt_expectations
15:44:33  Installed from version 0.8.5
15:44:33  Updated version available: 0.10.1
15:44:33  Installing calogica/dbt_date
15:44:33  Installed from version 0.7.2
15:44:33  Updated version available: 0.10.0
15:44:33  Installing dbt-labs/dbt_utils
15:44:33  Installed from version 1.1.0
15:44:33  Updated version available: 1.1.1
15:44:33  Installing dbt-labs/spark_utils
15:44:33  Installed from version 0.3.0
15:44:33  Up to date!
15:44:33  Installing dbt-labs/codegen
15:44:34  Installed from version 0.9.0
15:44:34  Updated version available: 0.11.0
15:44:34  Installing elementary-data/elementary
15:44:34  Installed from version 0.8.1
15:44:34  Updated version available: 0.12.1
15:44:34  Installing dbt-labs/dbt_project_evaluator
15:44:34  Installed from version 0.6.1
15:44:34  Updated version available: 0.8.0
15:44:34  
15:44:34  Updates available for packages: ['calogica/dbt_expectations', 'calogica/dbt_date', 'dbt-labs/dbt_utils', 'dbt-labs/codegen', 'elementary-data/elementary', 'dbt-labs/dbt_project_evaluator']                 
Update your versions in packages.yml, then run dbt deps
[MainThread] 2023-11-03 15:44:34,916 root         DEBUG    Current directory is /Users/josh.elston-green/repos/my_repo/projects/my_project

Environment

- OS: MacOS 14.0 (23A344)
- Python: 3.11
- dbt: 1.6.0

Which database adapter are you using with dbt?

spark

Additional Context

Databricks

@jelstongreen jelstongreen added bug Something isn't working triage labels Nov 3, 2023
@github-actions github-actions bot changed the title [Bug] Running dbt.invoke(['deps']) changes the current working directory when using a DBT_PROJECT_DIR [CT-3319] [Bug] Running dbt.invoke(['deps']) changes the current working directory when using a DBT_PROJECT_DIR Nov 3, 2023
@dbeatty10
Copy link
Contributor

Good to get to hang with you at Coalesce in San Diego @jelstongreen 😎

Thanks for reporting this. We've had a few places where either the current working directory or relative paths throw a wrench into things, so I'll give this a look.

@dbeatty10 dbeatty10 self-assigned this Nov 3, 2023
@dbeatty10
Copy link
Contributor

I see what you are saying @jelstongreen 👍

When I tried it out a python script using programmatic invocations like you described, deps changed the current working directory whenever the PROJECT_DIR global config was set via the --project-dir flag or the DBT_PROJECT_DIR environment variable.

In contrast, all of the following leave the current working directory as-is (as expected):

  • debug
  • parse
  • compile
  • list
  • run
  • build

Also in contrast, running the following without a python script within a CLI does not change the current working directory:

pwd
dbt deps --project-dir project_a
pwd

Reprex

Here's the reproduction case I used.

Make a subdirectory to hold a trivial dbt project:

mkdir project_a

Add the only required file for a dbt project:

project_a/dbt_project.yml

name: "package_a"
version: "1.0.0"
config-version: 2
profile: "sandcastle"

Even though we're going to invoke dbt deps, we don't even need to create a dependencies.yml or packages.yml file.

runner.py

from dbt.cli.main import dbtRunner, dbtRunnerResult
import os

# initialize
dbt = dbtRunner()

# Get and print the current working directory
cwd_before = os.getcwd()
print("Current working directory before:", cwd_before)

# "deps" modifies the current working directory when the PROJECT_DIR config is set
# Either via DBT_PROJECT_DIR enviornment variable or --project-dir CLI flag
cli_args = ["deps", "--project-dir", "project_a"]

# Run the command
res: dbtRunnerResult = dbt.invoke(cli_args)

cwd_after = os.getcwd()
print("Current working directory after:", cwd_after)

# Check if the current working directory has changed or not
assert cwd_after == cwd_before, "Current working directory has changed!"

Run the script and see it fails the assertion:

python runner.py

@dbeatty10 dbeatty10 removed the triage label Nov 6, 2023
@dbeatty10 dbeatty10 removed their assignment Nov 6, 2023
@jelstongreen
Copy link
Author

Hey there @dbeatty10 thank you for taking a look and good news that you were able to reproduce. Is this likely to be fixed in an imminent release?

@rariyama
Copy link

rariyama commented Dec 3, 2023

I encountered the same issue and thoroughly investigated the code. I found directroy changes when initializing deps task.
I'll write down what I found.

  1. When deps command is executed via dbtRunner().invoke command, DepsTask class is initialized in dbt-core.core.dbt.cli.main.py. This class stores the command context and functions for deps command.
    https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/cli/main.py#L492

  2. When initializing the class, directory changes via move_to_nearest_project_dir.
    https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/task/deps.py#L100

I confirmed it works same as the behaviour of dbt deps by modifying code of deps function as follows.

def deps(ctx, **kwargs):
    ...
    cur_dir = os.getcwd()
    task = DepsTask(flags, ctx.obj["project"])
    results = task.run()
    success = task.interpret_results(results)
    os.chdir(cur_dir)
    return results, success

I also confirmed that the same problem happens when executing dbt clean by python because the directory also changes when initializing CleanTask.
https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/task/clean.py#L24

If this is okay for you, I'll create a PR but what do you think about? I'm concerning about this will become breaking changes.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 3, 2023

@rariyama Thanks for taking a look!

We're calling this a bug, because this directory switching after-effect of deps/clean is really the intended behavior. So I'd be supportive of fixing this, even if it means changing the behavior (and potentially breaking someone's workflow).

It feels like move_to_nearest_project_dir could become a context manager, that returns to the original working directory once the deps'ing / clean'ing / init'ing is complete. But very happy to hear if you have other ideas for how you'd go about fixing this :)

@jtcohen6 jtcohen6 added python_api Issues related to dbtRunner Python entry point deps dbt's package manager file_system How dbt-core interoperates with file systems to read/write data help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors labels Dec 3, 2023
@rariyama
Copy link

rariyama commented Dec 3, 2023

Hi @jtcohen6
Thank you for your feedback. I feel your idea is SGTM.
What do you think about to handle task class as a context manager? My code image is as follows.

# deps.py
class DepsTask(BaseTask):
    def __init__(self, args: Any, project: Project) -> None:
        ...
        project.project_root = str(Path(project.project_root).resolve())

        self.orig_dir = os.getcwd()
        move_to_nearest_project_dir(project.project_root)
        super().__init__(args=args, config=None, project=project)
        self.cli_vars = args.vars

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        os.chdir(self.orig_dir)
    ...

# main.py
def deps(ctx, **kwargs):
    ...
    with DepsTask(flags, ctx.obj["project"]) as task:
        results = task.run()
        success = task.interpret_results(results)
    return results, success
....

I confirmed the directory change occurs only within with statement by making the above change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working deps dbt's package manager file_system How dbt-core interoperates with file systems to read/write data help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors python_api Issues related to dbtRunner Python entry point
Projects
None yet
4 participants