Databricks Community

Venugopal · 4 weeks ago

Hi,

I am using Databricks CLI 0.227.1 for creating a bundle project to deploy job.

As per this , https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/variables I wanted to have variable-overrides.json to have my variables.

I created a json file variable-overrides.json at .databricks/bundle/dev/variable-overrides.json.

I then referred these variables in the job definition like this:

resources:
jobs:
venu_test_poc1_job:
name: venu_test_poc1_job
tasks:
- task_key: ${var.task_key}
job_cluster_key: ${var.job_cluster_key}
spark_python_task:
python_file: ../src/venu_test_poc1/task.py
job_clusters:
- job_cluster_key: ${var.job_cluster_key}
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: Standard_D3_v2
autoscale:
min_workers: 1
max_workers: 3

This is my variable-overrides.json:

{

"task_key": "ms-ip-usage-collection-task",

"job_cluster_key": "ms-ip-usage-collection-job-cluster"

}

When I run databricks bundle validate, I get the below error:

Error: reference does not exist: ${var.job_cluster_key}

As per this, I tried using vars.variablename instead of var.variablename and then the validate command went fine.

But the bundle deploy command threw the below error:

Error: Reference to undeclared resource

on bundle.tf.json line 45, in resource.databricks_job.venu_test_poc1_job.task[0]:
45: "job_cluster_key": "${vars.job_cluster_key}",

A managed resource "vars" "job_cluster_key" has not been declared in the root
module.

Error: Reference to undeclared resource

on bundle.tf.json line 49, in resource.databricks_job.venu_test_poc1_job.task[0]:
49: "task_key": "${vars.task_key}"

A managed resource "vars" "task_key" has not been declared in the root
module.

To get things moving, I had put the variable definition and their assignments in the resource bundle config file databricks.yaml itself. But this doesnt look clean as there will be lot of variables that I need to use and it is best to have them in a json separately.

Pls help me understand how to get the values from variable-overrides.json.

Thanks,

Venu

ashraf1395 · 4 weeks ago

Hi there @Venu,

You have to specify the names of those variables in databricks.yml as well

variables:
 task_key:
 metadata_schema:

then you can reference them later in your jobs definition : ${var.task_key} or ${var.job_cluster_key} the way you did.

and have your variables file in .databricks/bundle/dev/variable-overrides.json

{
 "task_key" : "abc",
 "job_cluster_key" : "efg"
}

This will make the code and run and deploy, but it wont solve your original problem of keeping the code clean.

For that you should divide your mappings into multiple bundle_configuration_files

like bundle.variables.yml and you can specifiy all your variables in that

similarly you can have it for bundle.resources.yml or bundle.targets.yml you can even have seperate files for each of your resources like pipelines, jobs, clusters etc.

keep them in a single folder

and then in your databricks.yml at the top

include:

- bundle.variables.yml

- bundle.resources.yml

You can find more information about these in the databricks bundle configuration syntax docs : https://docs.databricks.com/aws/en/dev-tools/bundles/settings

Venugopal · 2 weeks ago

Hi @ashraf1395 - I tried the approach suggested by you and it is not working as expected.

My databricks.yml file has:

bundle:

name: ip_obs_bundle

variables:

v_task_key:

description: The task key.

default: "default_taskkey"

v_job_cluster_key:

description: The cluster key.

default: "default_clusterkey"

v_job_name:

description: The job name.

default: "default_jobname"

include:

- resources/*.yml

My job definition under resources folder:

resources:

jobs:

ip_obs_bundle_job:

name: ${var.v_job_name}

tasks:

- task_key: ${var.v_task_key}

job_cluster_key: ${var.v_job_cluster_key}

spark_python_task:

python_file: ../src/venu_ip_obs_poc1/task.py

job_clusters:

- job_cluster_key: ${var.v_job_cluster_key}

new_cluster:

spark_version: 15.4.x-scala2.12

node_type_id: Standard_D3_v2

autoscale:

min_workers: 1

max_workers: 3

My variables-overrides.json under projectroot/.databricks/bundle/dev looks like:

{
---truncated
"v_job_name": "overrride_jobname",
"v_task_key": "override_taskkey",
"v_job_cluster_key": "override_clusterkey",
--truncated
}

I don't have any environment variable or command line variable that could conflict with variables-overrides.json.

Even after having this, I get to see the jobs and tasks with names that have been taken from default values in the variable definition in databricks.yml, but they don't pick the values from variables-overrides.json.

Can you pls help with this? Not sure what am I missing here.

ashraf1395 · 2 weeks ago

Hi there @Venugopal ,
According to the precedence order of databricks overrides value should be taken first than default values

Anyways
If you just put the variable names in databrics.yml file it will work. No need to give default values if not required

 variables:
    v_task_key:
      description: The task key.
    v_job_cluster_key:
      description: The cluster key.
    v_job_name:
      description: The job name.