cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

policy_id in databricks asset bundle workflow

Aria
New Contributor III

We are using databricks asset bundle for code deployment and biggest issue I am facing is that policy_id is different in each environment.I tried with environment variable sin azure devops and also with declaring the variables in databricks.yaml and then used it in resources folder. However nothing worked till now.

@policy_id @DAB

4 REPLIES 4

maikl
New Contributor II

Hi Aria,

did you solve the issue? or did you use any workaround?

Thank you.

-werners-
Esteemed Contributor III

I use it and it works (kinda).
What you have to do is define a variable for the compute policy in the variables section (databricks.yml).
In the target section (also databricks.yml) you set the policy ID per environment (dev, prod,...).
Then in your job yaml, you need to call the variable in the cluster definition in the new_cluster section using policy_id:
policy_id: ${var.policy}
Like that the policy will be used for cluster creation.
Mind that you still have to pass certain values even though the policy already contains them (spark_version f.e.)

maikl
New Contributor II

When you define the policy_id in the target block then the variable doesn't make sense. For me works these:

I solved it for now, to override the value for policy_id in the target part:

targets:
  dev:
    workspace:
      root_path: ~/DATABRICKS_BUNDLES
    resources:
      jobs:
        read_data_lake:
          name: read_data_lake
          job_clusters:
            - job_cluster_key: Job_cluster
              new_cluster:
                policy_id: <POLICY_ID from databricks-workspace-a>
  tst:
    workspace:
      root_path: ~/DATABRICKS_BUNDLES
    resources:
      jobs:
        read_data_lake:
          name: read_data_lake
          job_clusters:
            - job_cluster_key: Job_cluster
              new_cluster:
                policy_id: <POLICY_ID from databricks-workspace-b>
 
I think it's useful for few jobs. When you have more then It's useless, because you must define each job in the target block to use your policy_id. If I missed something or am wrong, please let me know ๐Ÿ™‚

-werners-
Esteemed Contributor III

Variables are useful but it depends on how you set up the bundles.

I define a policy per target.  Since 'policy'  is not known to the target, I create a variable and assign it a different value depending on the environment.
This variable is then used in the jobs, which reside in another file.

I keep environment definition and job definition separated so it is easier to promote from dev to prod

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group