- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Hi, I' m trying to create a terraform script that does the following:
- create a policy where I specify env variables and libraries
- create a cluster that inherits from that policy and uses the env variables specified in the policy.
I saw in the decumentation the command apply_policy_default_values that should allow me to set the env variables as defined under the policy definition, but it seems like it doesn't do that.
How can I get the desired behavior?
Here the code:
locals {
policy_environmental_variables = {
"spark_env_vars.VAR1" : {
"type": "fixed",
"value": var.environment == "value1" ? "" : "value2"
},
"spark_env_vars.VAR2": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2",
}
"spark_env_vars.VAR3": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2"
}
}
}
job_cluster {
job_cluster_key = "shared_job_cluster"
new_cluster {
num_workers = 2
spark_version = local.spark_version
node_type_id = "Standard_DS3_v2"
policy_id = databricks_cluster_policy.policy.id
apply_policy_default_values = true
}
}
- Labels:
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
The issue you're encountering stems from how the `apply_policy_default_values` parameter works in Databricks' Terraform provider. While this parameter is intended to apply default values defined in a cluster policy, it does not automatically populate environment variables (`spark_env_vars`) specified in the policy when creating a cluster. Here's how you can resolve this:
Explanation and Solution
Problem
The `apply_policy_default_values` parameter applies default values for missing cluster attributes defined in the policy, but it does not handle nested attributes like `spark_env_vars` directly. This behavior is consistent with the documentation, which states that default values for nested attributes must still be explicitly provided when creating a cluster through Terraform.
Solution
To achieve your desired behavior:
1. Explicitly Pass Environment Variables: You need to explicitly pass the `spark_env_vars` when defining the cluster, even if they are fixed in the policy. The `apply_policy_default_values` parameter does not automatically propagate these values.
2. Use Policy Definition to Restrict Configuration: Define the environment variables as fixed in your cluster policy to prevent users from modifying them. However, you must still explicitly reference these variables in your Terraform script.
Updated Code Example
Here’s how you can modify your Terraform script:
#Cluster Policy Definition
```hcl
resource "databricks_cluster_policy" "policy" {
name = "Environment Policy"
definition = jsonencode({
"spark_env_vars.VAR1": {
"type": "fixed",
"value": var.environment == "value1" ? "" : "value2"
},
"spark_env_vars.VAR2": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2"
},
"spark_env_vars.VAR3": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2"
}
})
}
```
#Cluster Definition
```hcl
resource "databricks_cluster" "cluster" {
num_workers = 2
spark_version = local.spark_version
node_type_id = "Standard_DS3_v2"
policy_id = databricks_cluster_policy.policy.id
apply_policy_default_values = true
spark_env_vars = {
VAR1 = var.environment == "value1" ? "" : "value2"
VAR2 = var.environment == "value1" ? "value1" : "value2"
VAR3 = var.environment == "value1" ? "value1" : "value2"
}
}
```
#Key Points
- Environment Variables: These must be explicitly passed under `spark_env_vars` in the cluster definition, even if fixed in the policy.
- Policy Enforcement: The policy ensures that users cannot override these values via UI or API.
- Terraform Behavior: The Terraform provider does not automatically propagate nested default values from policies; manual inclusion is required.
By combining explicit variable passing and policy enforcement, you can ensure that your clusters inherit the desired environment variables while adhering to the policy constraints.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
The issue you're encountering stems from how the `apply_policy_default_values` parameter works in Databricks' Terraform provider. While this parameter is intended to apply default values defined in a cluster policy, it does not automatically populate environment variables (`spark_env_vars`) specified in the policy when creating a cluster. Here's how you can resolve this:
Explanation and Solution
Problem
The `apply_policy_default_values` parameter applies default values for missing cluster attributes defined in the policy, but it does not handle nested attributes like `spark_env_vars` directly. This behavior is consistent with the documentation, which states that default values for nested attributes must still be explicitly provided when creating a cluster through Terraform.
Solution
To achieve your desired behavior:
1. Explicitly Pass Environment Variables: You need to explicitly pass the `spark_env_vars` when defining the cluster, even if they are fixed in the policy. The `apply_policy_default_values` parameter does not automatically propagate these values.
2. Use Policy Definition to Restrict Configuration: Define the environment variables as fixed in your cluster policy to prevent users from modifying them. However, you must still explicitly reference these variables in your Terraform script.
Updated Code Example
Here’s how you can modify your Terraform script:
#Cluster Policy Definition
```hcl
resource "databricks_cluster_policy" "policy" {
name = "Environment Policy"
definition = jsonencode({
"spark_env_vars.VAR1": {
"type": "fixed",
"value": var.environment == "value1" ? "" : "value2"
},
"spark_env_vars.VAR2": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2"
},
"spark_env_vars.VAR3": {
"type": "fixed",
"value": var.environment == "value1" ? "value1" : "value2"
}
})
}
```
#Cluster Definition
```hcl
resource "databricks_cluster" "cluster" {
num_workers = 2
spark_version = local.spark_version
node_type_id = "Standard_DS3_v2"
policy_id = databricks_cluster_policy.policy.id
apply_policy_default_values = true
spark_env_vars = {
VAR1 = var.environment == "value1" ? "" : "value2"
VAR2 = var.environment == "value1" ? "value1" : "value2"
VAR3 = var.environment == "value1" ? "value1" : "value2"
}
}
```
#Key Points
- Environment Variables: These must be explicitly passed under `spark_env_vars` in the cluster definition, even if fixed in the policy.
- Policy Enforcement: The policy ensures that users cannot override these values via UI or API.
- Terraform Behavior: The Terraform provider does not automatically propagate nested default values from policies; manual inclusion is required.
By combining explicit variable passing and policy enforcement, you can ensure that your clusters inherit the desired environment variables while adhering to the policy constraints.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
I undestand your answer, what triggers me is the following: If I fix the env variables on the policy through UI, these propagate to clusters to which that policy is assigned, instead via Terraform, as per your answer, you need to specify the variables inside the cluster itself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
You're correct in observing this discrepancy. When a cluster policy is defined and applied through the Databricks UI, fixed environment variables (`spark_env_vars`) specified in the policy automatically propagate to clusters created under that policy. However, when using Terraform, this behavior does not occur automatically due to how the Terraform provider currently handles cluster policies and nested attributes.
Why This Happens
The difference arises from the implementation of the Databricks Terraform provider:
- UI Behavior: The Databricks UI directly enforces policy constraints and propagates fixed values (like `spark_env_vars`) to clusters created under the policy.
- Terraform Behavior: The Terraform provider requires explicit specification of nested attributes like `spark_env_vars` in the cluster definition, even if they are fixed in the policy. The `apply_policy_default_values` parameter only applies default values for top-level attributes, not nested ones.
This is a limitation of the Terraform provider's design, and it has been noted by users in various forums and GitHub issues. The provider does not yet fully replicate the behavior of the Databricks UI when it comes to applying policies, especially for nested configurations like environment variables.
Potential Workarounds
If you want to mimic the UI behavior in Terraform, here are some approaches:
1. Explicitly Define Environment Variables in Cluster Configuration
As mentioned earlier, explicitly set `spark_env_vars` in your cluster definition based on your policy. While this requires additional effort, it ensures consistency with your policy.
2. Use a Script or Module to Automate Variable Propagation
Create a Terraform module or script that dynamically reads the policy definition (e.g., via Databricks API) and applies its fixed values to clusters. This approach requires custom scripting but can automate variable propagation.
3. Raise an Issue with Terraform Provider
If this behavior is critical for your workflow, consider raising an issue on the [Databricks Terraform provider GitHub repository](https://github.com/databricks/terraform-provider-databricks). This could help bring attention to the limitation and potentially lead to improvements in future releases.
Summary
The discrepancy between UI and Terraform arises because the Terraform provider does not automatically propagate nested attributes like `spark_env_vars` from policies to clusters. While this is frustrating, explicitly defining these variables in your cluster configuration is currently required when using Terraform. For now, leveraging automation or scripting can help bridge this gap until the provider's functionality improves.

