Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbx sync repo should not have default profile #833

Open
ep-mo opened this issue Aug 14, 2023 · 7 comments
Open

dbx sync repo should not have default profile #833

ep-mo opened this issue Aug 14, 2023 · 7 comments

Comments

@ep-mo
Copy link

ep-mo commented Aug 14, 2023

Expected Behavior

dbx sync repo --dest-repo=test should default to use $DATABRICKS_HOST and $DATABRICKS_TOKEN environment variables for authentication.

Current Behavior

dbx sync repo --dest-repo=test defaults to --profile=DEFAULT, because argument --profile is specified with [default: DEFAULT]. This means that the current default behavior is to use config file instead of environment variables (as recommended for CI/CD pipelines). Since our CI/CD pipeline does not have a config file, we get the following error: Could not find a databricks-cli config for profile DEFAULT

Our current workaround is to set an empty profile:
dbx sync repo --dest-repo=test --profile=

Steps to Reproduce (for bugs)

  • $DATABRICKS_HOST and $DATABRICKS_TOKEN must be specified
  • ~/.databrickscfg should not exist
  • run dbx sync repo --dest-repo=<repo>

Context

CI/CD pipeline using dbx sync repo

Your Environment

  • dbx version used: 0.8.18
  • Databricks Runtime version:
@CristianSmau
Copy link

CristianSmau commented Sep 14, 2023

I have the same issue.

However, as a workaround for this, the databrikcs-cli can be set on the DEFAULT profile by running an additional step as follows:

  • bash: |
    DATABRICKS_HOST='https://adb-XXXXXXXXXXX.azuredatabricks.net/' # This can be easily parametrized to avoid passing the value in clear text.
    DATABRICKS_TOKEN='XXXXXXXXXXXXXXXXXXXXXXXXXXX' # This can be easily parametrized to avoid passing the value in clear text.
    databricks configure --token --profile DEFAULT <<EOF
    $DATABRICKS_HOST
    $DATABRICKS_TOKEN
    EOF
    displayName: 'Databricks Configuration for repo Sync'

@doug-cresswell
Copy link

Here's a workaround that can be quite useful, but it might not be the best fit for containerized deployments, especially when you're running dbx deploy inside a Docker container. In this scenario:

A: You'll need to create the cfg file in your Dockerfile during the build process.
B: Consequently, you'll have to rebuild the image whenever there are changes to these environment variables.

One thing to keep in mind is that this approach involves storing your secrets in plaintext in a file, which isn't ideal 👎. It's worth noting that cfg profiles are primarily designed for local development, which can make them less suitable for continuous integration and continuous deployment (CICD).

Using the profile can also complicate the typical process of customizing a container with environment variables. Since the cfg profile is created during the build, it's challenging to use the same image for builds across multiple environments. This can introduce some complexities, especially when dealing with CICD tools like GitLab and GitHub Actions that rely on secret environment variables.

Don't get me wrong—I'm currently using this workaround myself, but it's worth acknowledging that the issue still deserves attention.

@ep-mo
Copy link
Author

ep-mo commented Sep 14, 2023

The workaround we use in our CI/CD pipeline does not require a cfg file, and is quite simple. We just pass an empty --profile= argument (as described in original issue), and dbx will fallback to using $DATABRICKS_HOST and $DATABRICKS_TOKEN. It took a process of trial and error to figure out..

Our current workaround is to set an empty profile:
dbx sync repo --dest-repo=test --profile=

@doug-cresswell
Copy link

Thanks @ep-mo, I didn't pick up on the empty profile workaround on my first read through. I will try to reproduce myself. Where you say

~/.databrickscfg should not exist

does this mean the empty profile workaround only works if the file is not present?

@ep-mo
Copy link
Author

ep-mo commented Sep 14, 2023

does this mean the empty profile workaround only works if the file is not present?

No, I don't think it matters, the workaround should work even if the cfg file is present (from the top of my head). When you specify --profile=, dbx should use your environment variables. Should not matter if the file is there.

In the Steps to Reproduce section I just tried to explain the steps to reproduce the issue and how to get a stack trace. If you have a cfg file when you try to reproduce, it will just fallback to your DEFAULT profile in your cfg file, and everything seems to work fine (if you have said file). However, expected behavior is to use environment variables if you don`t use the profile argument, not the cfg file.

@CristianSmau
Copy link

CristianSmau commented Sep 14, 2023

Looks like DBX got deprecated and replaced by DAB. Worth looking into the new package.

@ep-mo
Copy link
Author

ep-mo commented Sep 14, 2023

Looks like DBX got deprecated and replaced by ADB . Worth looking into the new package.

As far as I know, dbx is not yet deprecated and is still maintained by Databricks Labs (source). Our CI/CD is built around dbx, so we will continue to use dbx, probably at least until DAB is generally available. But yeah, worth looking into the new package. If we started from scratch today, we would probably look into DAB, because it's expected to supersede dbx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants