cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

databricks asset bundles: Unable to fetch variables from variable-overrides.json

Venugopal
New Contributor III

Hi,

I am using Databricks CLI 0.227.1 for creating a bundle project to deploy job.

As per this , https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/variables I wanted to have variable-overrides.json to have my variables.

I created a json file variable-overrides.json at .databricks/bundle/dev/variable-overrides.json.

I then referred these variables in the job definition like this:

resources:
jobs:
venu_test_poc1_job:
name: venu_test_poc1_job
tasks:
- task_key: ${var.task_key}
job_cluster_key: ${var.job_cluster_key}
spark_python_task:
python_file: ../src/venu_test_poc1/task.py
job_clusters:
- job_cluster_key: ${var.job_cluster_key}
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: Standard_D3_v2
autoscale:
min_workers: 1
max_workers: 3

This is my variable-overrides.json:

{
"task_key": "ms-ip-usage-collection-task",
"job_cluster_key": "ms-ip-usage-collection-job-cluster"
}
}

When I run databricks bundle validate, I get the below error:

Error: reference does not exist: ${var.job_cluster_key}

As per this, I tried using vars.variablename instead of var.variablename and then the validate command went fine.

But the bundle deploy command threw the below error:

Error: Reference to undeclared resource

on bundle.tf.json line 45, in resource.databricks_job.venu_test_poc1_job.task[0]:
45: "job_cluster_key": "${vars.job_cluster_key}",

A managed resource "vars" "job_cluster_key" has not been declared in the root
module.

Error: Reference to undeclared resource

on bundle.tf.json line 49, in resource.databricks_job.venu_test_poc1_job.task[0]:
49: "task_key": "${vars.task_key}"

A managed resource "vars" "task_key" has not been declared in the root
module.

To get things moving, I had put the variable definition and their assignments in the resource bundle config file databricks.yaml itself. But this doesnt look clean as there will be lot of variables that I need to use and it is best to have them in a json separately.

Pls help me understand how to get the values from variable-overrides.json.

Thanks,

Venu

Venu

5 REPLIES 5

ashraf1395
Valued Contributor III

Hi there @Venu,

You have to specify the names of those variables in databricks.yml as well

variables:
 task_key:
 metadata_schema:

then you can reference them later in your jobs definition : ${var.task_key} or ${var.job_cluster_key} the way you did.

and have your variables file in .databricks/bundle/dev/variable-overrides.json

{
 "task_key" : "abc",
 "job_cluster_key" : "efg"
}

This will make the code and run and deploy, but it wont solve your original problem of keeping the code clean.

For that you should divide your mappings into multiple bundle_configuration_files

like bundle.variables.yml and you can specifiy all your variables in that

similarly you can have it for bundle.resources.yml or bundle.targets.yml  you can even have seperate files for each of your resources like pipelines, jobs, clusters etc.

keep them in a single folder 

and then in your databricks.yml at the top 

include:

- bundle.variables.yml

- bundle.resources.yml

You can find more information about these in the databricks bundle configuration syntax docs : https://docs.databricks.com/aws/en/dev-tools/bundles/settings

Venugopal
New Contributor III

Hi @ashraf1395 - I tried the approach suggested by you and it is not working as expected.

My databricks.yml file has:

 bundle:
    name: ip_obs_bundle
  variables:
    v_task_key:
      description: The task key.
      default: "default_taskkey"
    v_job_cluster_key:
      description: The cluster key.
      default: "default_clusterkey"
    v_job_name:
      description: The job name.
      default: "default_jobname"
  include:
    - resources/*.yml
My job definition under resources folder:
  resources:
    jobs:
      ip_obs_bundle_job:
        name: ${var.v_job_name}
        tasks:
          - task_key: ${var.v_task_key}
            job_cluster_key: ${var.v_job_cluster_key}
            spark_python_task:
              python_file: ../src/venu_ip_obs_poc1/task.py
        job_clusters:
          - job_cluster_key: ${var.v_job_cluster_key}
            new_cluster:
             spark_version: 15.4.x-scala2.12
             node_type_id: Standard_D3_v2
             autoscale:
                min_workers: 1
                max_workers: 3
 
My variables-overrides.json under projectroot/.databricks/bundle/dev looks like:
{
---truncated
"v_job_name": "overrride_jobname",
"v_task_key": "override_taskkey",
"v_job_cluster_key": "override_clusterkey",
--truncated
}
 
I don't have any environment variable or command line variable that could conflict with variables-overrides.json.
Even after having this, I get to see the jobs and tasks with names that have been taken from default values in the variable definition in databricks.yml, but they don't pick the values from variables-overrides.json.
 
Can you pls help with this? Not sure what am I missing here.

ashraf1395
Valued Contributor III

Hi there @Venugopal ,
According to the precedence order of databricks overrides value should be taken first than default values

ashraf1395_0-1741407820791.png

 

Anyways
If you just put the variable names in databrics.yml file it will work. No need to give default values if not required

 variables:
    v_task_key:
      description: The task key.
    v_job_cluster_key:
      description: The cluster key.
    v_job_name:
      description: The job name.



Venugopal
New Contributor III

@ashraf1395 Thanks for your inputs. Let me check this and revert back.

Venugopal
New Contributor III

@ashraf1395 any thoughts on the above issue?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group