Hi Team,
I believe these are a few suggestions that can help!
Start with environment-based policies: dev, qa, prod
These policies define the broadest guardrails (security, cost control, stability)
Add team-specific variants only if required
For example: prod cluster policy is shared, unless Data Engineering needs special Spark config
Use one base policy per environment, and define optional team-specific overlays when needed
Below are a few sample templates for policies -
{
"name": "qa-shared-policy",
"definition": {
"spark_version": { "type": "fixed", "value": "<DBR>" },
"node_type_id": {
"type": "allowlist",
"values": ["Standard_D4s_v3"]
},
"autoscale.min_workers": { "type": "fixed", "value": 2 },
"autoscale.max_workers": { "type": "fixed", "value": 6 },
"enable_elastic_disk": { "type": "fixed", "value": true },
"init_scripts": { "type": "forbidden" },
"aws_attributes.availability": { "type": "fixed", "value": "SPOT" },
"custom_tags.environment": { "type": "fixed", "value": "qa" }
}
}
{
"name": "prod-data-engineering",
"definition": {
"spark_version": { "type": "fixed", "value": "<DBR>" },
"node_type_id": {
"type": "allowlist",
"values": ["Standard_D8s_v3"]
},
"autoscale.min_workers": { "type": "fixed", "value": 2 },
"autoscale.max_workers": { "type": "fixed", "value": 10 },
"enable_elastic_disk": { "type": "fixed", "value": true },
"init_scripts": { "type": "forbidden" },
"aws_attributes.availability": { "type": "fixed", "value": "ON_DEMAND" },
"data_security_mode": { "type": "fixed", "value": "USER_ISOLATION" },
"custom_tags.team": { "type": "fixed", "value": "data-eng" },
"custom_tags.environment": { "type": "fixed", "value": "prod" }
}
}
Each environment typically enforces a distinct set of restrictions based on its purpose. In Dev and QA, policies often allow greater flexibility to support experimentation and testing. Spot instances, for instance, are usually allowed in Dev to reduce cost, while in QA they might be optional depending on workload criticality. Public IPs are typically disallowed in all environments to maintain network security. Dev clusters generally enforce small, cost-effective node types with autoscaling enabled and worker limits kept low (e.g., 1–4 workers). Init scripts are usually permitted in Dev for experimentation but are tightly controlled or disabled altogether in QA and disallowed in Prod to ensure production stability.
In contrast, Prod policies are much more restrictive. Spot instances and user-defined init scripts are usually disabled to ensure reliability and reduce the risk of unexpected behavior. Node types are limited to high-performance, stable instances, and autoscaling is still enabled, but with a higher upper bound to handle larger workloads. Runtime versions are often pinned and reviewed to ensure compatibility and security, and data security modes are enforced (e.g., USER_ISOLATION or TABLE_ACL when using Unity Catalog). Additionally, mandatory tagging (such as team, environment, cost_center) is enforced across all environments to support cost attribution, auditing, and governance.
Please refer to these documentations below as well -
https://docs.databricks.com/aws/en/security
https://docs.databricks.com/aws/en/data-governance/unity-catalog
Hope this helps!