Databricks Community

Tommabip · 2 weeks ago

Hi, I' m trying to create a terraform script that does the following:
- create a policy where I specify env variables and libraries

- create a cluster that inherits from that policy and uses the env variables specified in the policy.

I saw in the decumentation the command apply_policy_default_values that should allow me to set the env variables as defined under the policy definition, but it seems like it doesn't do that.

How can I get the desired behavior?
Here the code:
locals {
policy_environmental_variables = {
"spark_env_vars.VAR1" : {
"type": "fixed",
"value": var.environment == "value1" ? "" : "value2"

},
"spark_env_vars.VAR2": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2",
}
"spark_env_vars.VAR3": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2"
}
}
}

job_cluster {
job_cluster_key = "shared_job_cluster"
new_cluster {
num_workers = 2
spark_version = local.spark_version
node_type_id = "Standard_DS3_v2"
policy_id = databricks_cluster_policy.policy.id
apply_policy_default_values = true
}
}

BigRoux · 2 weeks ago

The issue you're encountering stems from how the `apply_policy_default_values` parameter works in Databricks' Terraform provider. While this parameter is intended to apply default values defined in a cluster policy, it does not automatically populate environment variables (`spark_env_vars`) specified in the policy when creating a cluster. Here's how you can resolve this:

Explanation and Solution

Problem
The `apply_policy_default_values` parameter applies default values for missing cluster attributes defined in the policy, but it does not handle nested attributes like `spark_env_vars` directly. This behavior is consistent with the documentation, which states that default values for nested attributes must still be explicitly provided when creating a cluster through Terraform.

Solution
To achieve your desired behavior:
1. Explicitly Pass Environment Variables: You need to explicitly pass the `spark_env_vars` when defining the cluster, even if they are fixed in the policy. The `apply_policy_default_values` parameter does not automatically propagate these values.

2. Use Policy Definition to Restrict Configuration: Define the environment variables as fixed in your cluster policy to prevent users from modifying them. However, you must still explicitly reference these variables in your Terraform script.

Updated Code Example
Here’s how you can modify your Terraform script:

#Cluster Policy Definition
```hcl
resource "databricks_cluster_policy" "policy" {
name = "Environment Policy"
definition = jsonencode({
"spark_env_vars.VAR1": {
"type": "fixed",
"value": var.environment == "value1" ? "" : "value2"
},
"spark_env_vars.VAR2": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2"
},
"spark_env_vars.VAR3": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2"
}
})
}
```

#Cluster Definition
```hcl
resource "databricks_cluster" "cluster" {
num_workers = 2
spark_version = local.spark_version
node_type_id = "Standard_DS3_v2"
policy_id = databricks_cluster_policy.policy.id
apply_policy_default_values = true
spark_env_vars = {
VAR1 = var.environment == "value1" ? "" : "value2"
VAR2 = var.environment == "value1" ? "value1" : "value2"
VAR3 = var.environment == "value1" ? "value1" : "value2"
}
}
```

#Key Points
- Environment Variables: These must be explicitly passed under `spark_env_vars` in the cluster definition, even if fixed in the policy.
- Policy Enforcement: The policy ensures that users cannot override these values via UI or API.
- Terraform Behavior: The Terraform provider does not automatically propagate nested default values from policies; manual inclusion is required.

By combining explicit variable passing and policy enforcement, you can ensure that your clusters inherit the desired environment variables while adhering to the policy constraints.

View solution in original post

BigRoux · 2 weeks ago