cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Passing parameters in Databricks workflows

argl1995dbks
New Contributor III

Hi Databricks, we have created several Databricks workflows and the `json-definition.json` for the same is stored inside version control i.e. GitHub. There are several parameters which are referred from params.json inside this job definition but the issue is the params.json is hardcoded as of current design. 

We are trying to figure out a way to pass config parameters to the Controller workflow without using hotfix branch / Databricks UI with some approvals/governance in place. Is there a more better way of doing this so that we don't have to dig inside the params.json to update/modify the existing keys/values.

I am adding the basic tree structure for my MLOps repo:

.
├── README.md
├── databricks_notebook_jobs
│   ├── CONTROLLER_JOB_CLUSTER
│   │   ├── YOUR_FIRST_NOTEBOOK_NAME.py
│   │   ├── job-definition.json
│   │   └── spark_env_vars
│   │       ├── dev.json
│   │       ├── expl.json
│   │       ├── prod.json
│   │       └── qa.json
│   ├── all_cluster_init.sh
│   ├── default_job_config.json
│   ├── example_module.py
│   └── release.json
├── devops
│   └── params.json
├── requirements.txt
│   └── utils
│       ├── Config.py
│       ├── Config_old.py
│       └── setenv.py
├── test_requirements.txt
├── tests
│   └── test_cleaning_utils.py
└── version.py

 

Please suggest.

 

4 REPLIES 4

jacovangelder
Honored Contributor

Have you considered using Databricks Asset Bundles? Very easy to parameterize! 

Hi @jacovangelder , I have created a `variables.yaml` file i.e.

 
variables:
  job_cluster_key:
    description: Databricks job cluster
    default: job_cluster

 

and while I am trying to refer this variable in my `job.yaml` file like this:

tasks:
        - task_key: Sum_Task
          job_cluster_key: ${var.job_cluster_key}
          spark_python_task:
            python_file: ../src/sum.py
 
and when I tried to validate it using `databricks bundle validate`, I got this error:
 
**Error: reference does not exist: ${variables.job_cluster_key}**
 
However, when I added variables in the same job.yaml file it is validated successfully.
 
Any reasons, why is that?

Hi @jacovangelder , I got the solution when you're referring the variables from the same file it should be referred as ${var.varname} and when you're referring from a different file use ${vars.varname}. Though I suspect why is that. Is there any specific reason?

Also I got an error at the time of deploying a bundle while referring the variables from another file, i.e. **A managed resource "vars" "job_cluster_key" has not been declared in the root
module.**

Hi @Retired_mod please allow me sometime, I will get back to you. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group