## How to parameterized DBX Python Notebook #841

ssr8998 · 2023-08-31T18:25:24Z

The overall goal is to make database name (prod/dev/test) dynamic for each notebook in dbx job and passing that database name directly from jenkins without modifying notebook file or deployment.yaml file for each environment .
If I am creating a dbx job where I have few databricks notebook and I want to pass the database name dynamically into each python notebook without using databricks widget (assuming I am using sys.args that will read the input of dbx clie parameter and I want to run my job something like :-
dbx launch --job "my_job_name" --parameter='{"db_name": "my_db_name"}' and it will send that info to my job and all associated notebook which will read these info from conf/deployment.yaml and in deployment.yaml file I will have something like :--
notebook_task:
notebook_path:"/Reposs/My_github_repo/blala/notebookname"
base_parameters:
db_name"{{env.db_name_from_env}}"

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

dbx version used:0.7.4
databricks-cli:0.17.3
spark_version:12.2.x-scala2.12
Databricks Runtime version: 12.2 LTS or above

doug-cresswell · 2023-09-04T12:59:00Z

Edit: I did not realise you specified a notebook task, updated with original comment left underneath
Edit 2: Updated CLI snippets to have same environment as yml example

To pass a value from a local environment variable to a workflow definition in a notebook you should instead define the environment variable in the cluster configuration and read them into the notebook e.g., database_name = os.environ.get('DATABASE_NAME'). This can be done in deployment.yml.

  basic-cluster: &basic-cluster
    new_cluster:
      spark_version: "10.4.x-cpu-ml-scala2.12"
      spark_conf:
        <<: *basic-spark-conf
        spark.databricks.passthrough.enabled: false
      spark_env_vars:
        DATABASE_NAME: "{{ env['DATABASE_NAME'] }}"

deployment.yml reference

See original comment below for how to use jinja with the deployment file.

Original comment

It is probably better practice to deploy separate workflows for separate environments, but to answer your question you can use the jinja support functionality (Jinja Support) combined with environment variables.

Also see Passing Parameters

Your deployment file should look something like this:
conf/deployment.yml.j2

build:
  python: "pip"

environments:
  default:
    workflows:
      - name: "my-workflow"
        tasks:
          - task_key: "task1"
            python_wheel_task:
              package_name: "some-pkg"
              entry_point: "some-ep"
              parameters: ["database_name", "{{ env['DATABASE_NAME'] }}"]

Deploy via CLI

export DATABASE_NAME=dev
dbx deploy --environment default --deployment-file conf/deployment.yml.j2 "my-workflow"

Launch via CLI

dbx launch --environment default --parameters='{"python_params":["database_name","${DATABASE_NAME}"]}' "my-workflow"

Note that you will need to append the .j2 extension to your yaml file, or alternately enable in place jinja support in your project configuration.

ssr8998 · 2023-09-06T14:18:12Z

I tried to follow your steps :-
Here is how my deployment.yaml.j2 look like :
{% set db_name =env['db_name'] | default('name_of_my_db') %}
......basic config etc.etc...

spark_python_task:
python_file: file://my_path_/name_of_python_notebook_converted_to_job.py"
parameters:
["db_name","{{env['db_name']}}"]
............

Now I am trying to access this database name into my name_of_python_notebook_converted_to_job.py by calling :-
db_name =json.loads(sys.argv[1]).get('python_params',[])[1]

I am calling the dbx cli like:-dbx deploy --deployment-file conf/deployment.yaml.j2 "name_of_my_work_flow"
and then to launch job:-dbx launch --parameters ='{"python_params":["db_name","${db_name}"]}' "name_of_my_work_flow"

look like my job can't read from sys.argv . I am getting error :-JSONDecoderError: Expecting value: line 1 column 1 (char 0)

----> db_name =json.loads(sys.argv[1]).get('python_params',[])[1]

ssr8998 · 2023-09-06T14:23:38Z

if I use export DATABASE_NAME=dev
dbx deploy -e dev --deployment-file conf/deployment.yml.j2 "my-workflow", it complains that "environment dev not found in the project file .dbx/project.json . In my project json I've environment -->default->profile, storage_type, properties -->workspce_directory, artifact_location

doug-cresswell · 2023-09-07T08:41:11Z

JSONDecoderError

Notebooks use widgets to pass parameters, so you cannot pass parameters to a notebook task like you would for an entrypoint in a python wheel. You either need to use widgets, or define environment variables on the cluster using spark_env_vars. This way the environment variables will be available to the notebook through os.environ.

Environment Not Found Error

For the error environment dev not found in the project file .dbx/project.json the environments defined in your deployment yaml must match those in your project.json file.

environments:
  default:

You can use the dbx configure command to set up new environments in your project if you should need multiple. If not simply remove the -e / --environment from your cli commands and it will use the "default" instead.
dbx configure docs
project.json docs

ssr8998 · 2023-09-07T18:51:13Z

Thanks for your reply , Well, I converted the notebook to a pure python file , no #magic and no #widget and no dbutils can be and should be used as we need to run unittest to test locally. Hence, I was expecting this plain python file will be able to take argument value from this cli . Look like it can't parse " dbx launch --job "my_job_name" --parameter='{"db_name": "my_db_name"}' . My question is : why the parameter's first field(key) "db_name" is not parsing into my sys.arg? db_name =json.loads(sys.argv[1]).get('python_params',[])[1]

…

On Thu, Sep 7, 2023 at 4:41 AM Doug Cresswell ***@***.***> wrote: JSONDecoderError Notebooks use *widgets* to pass parameters, so you cannot pass parameters to a notebook task like you would for an entrypoint in a python wheel. You either need to use widgets, or define environment variables *on the cluster* using spark_env_vars. This way the environment variables will be available to the notebook through os.environ. Environment Not Found Error For the error environment dev not found in the project file .dbx/project.json the environments defined in your deployment yaml must match those in your project.json file. environments: default: You can use the dbx configure command to set up new environments in your project if you should need multiple. If not simply remove the -e / --environment from your cli commands and it will use the "default" instead. dbx configure docs <https://urldefense.proofpoint.com/v2/url?u=https-3A__dbx.readthedocs.io_en_latest_reference_cli_-23dbx-2Dconfigure&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=XbyuY5JATO0jqJ0BtjDm_g&m=sxAtfWaj6ob_udX_IztzjOCcxXrYh8q6MAXhZUYse3UhQdnfEYaZWzfml3Oe3IIi&s=Vui1bGZB4c7a6EnIWRw4XnHPWVrdbsJxIN5gGruAN3E&e=> project.json docs <https://urldefense.proofpoint.com/v2/url?u=https-3A__dbx.readthedocs.io_en_latest_reference_project_-3Fh-3Denvironment&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=XbyuY5JATO0jqJ0BtjDm_g&m=sxAtfWaj6ob_udX_IztzjOCcxXrYh8q6MAXhZUYse3UhQdnfEYaZWzfml3Oe3IIi&s=v3B49HsjLzUv10ZEVhz8s9C7kRkAC1dIySo4sHhFOys&e=> *FYI for next time, this kind of question is probably more appropriate for Stack Overflow than a GitHub issue.* — Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databrickslabs_dbx_issues_841-23issuecomment-2D1709720157&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=XbyuY5JATO0jqJ0BtjDm_g&m=sxAtfWaj6ob_udX_IztzjOCcxXrYh8q6MAXhZUYse3UhQdnfEYaZWzfml3Oe3IIi&s=IWAwkG5ZclFgr5FQ-9cry-90jOACEP2iD7WJE7j2mDs&e=>, or unsubscribe <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AUHPD3UO76E7V67WV7KT72TXZGCDDANCNFSM6AAAAAA4GPDMQM&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=XbyuY5JATO0jqJ0BtjDm_g&m=sxAtfWaj6ob_udX_IztzjOCcxXrYh8q6MAXhZUYse3UhQdnfEYaZWzfml3Oe3IIi&s=X4bcH1iVy4ZO52H9EdQFc010iZsU2qAm61lhCkW8Iyw&e=> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

## How to parameterized DBX Python Notebook #841

## How to parameterized DBX Python Notebook #841

ssr8998 commented Aug 31, 2023

doug-cresswell commented Sep 4, 2023 •

edited

ssr8998 commented Sep 6, 2023

ssr8998 commented Sep 6, 2023

doug-cresswell commented Sep 7, 2023 •

edited

ssr8998 commented Sep 7, 2023 via email

## How to parameterized DBX Python Notebook #841

## How to parameterized DBX Python Notebook #841

Comments

ssr8998 commented Aug 31, 2023

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

doug-cresswell commented Sep 4, 2023 • edited

ssr8998 commented Sep 6, 2023

ssr8998 commented Sep 6, 2023

doug-cresswell commented Sep 7, 2023 • edited

JSONDecoderError

Environment Not Found Error

ssr8998 commented Sep 7, 2023 via email

doug-cresswell commented Sep 4, 2023 •

edited

doug-cresswell commented Sep 7, 2023 •

edited