cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Implementing Governance on DLT pipelines using compute policy

DeepankarB
New Contributor III

I am implementing governance over compute creation in the workspaces by implementing custom compute policies for all-purpose, job and dlt pipelines. I was successfully able to create compute policies for all-purpose and jobs where I could restrict the users from creating computes using limited number of instances based on compute policies. My intention is to enable permissions on these policies to a particular group who would own the workspace. So this group can create computes required by the workspace users. 

I am facing few issues when I am trying to implement the same for DLT. I have created a policy with limited number of instances and worker nodes which can be created using this policy. However, I need to understand how can this be implemented in the same way I am able to do it for All purpose and Job Computes.  

While I can see the DLT compute policy in the UI when creating DLT pipeline. Selecting it does not restrict the list of compute instances showing in driver and worker type drop downs. How can we achieve this ?

Also, I cannot see option to select any prebuild DLT pipeline compute which a developer can use when creating a DLT pipeline. How can we achieve similar objective in Databricks?

below is a sample policy for DLT

{
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-lts"
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": true,
    "hidden": true
  },
  "node_type_id": {
    "type": "allowlist",
    "values": [
      "m5a.large",
      "m5a.xlarge",
      "m5a.2xlarge",
      "m5d.large",
      "m5d.xlarge",
      "m5d.2xlarge",
      "m5d.4xlarge",
      "m5d.8xlarge",
      "r5a.large",
      "r5a.xlarge",
      "r5a.2xlarge",
      "r5a.8xlarge",
      "c5a.xlarge",
      "c5a.2xlarge",
      "c5a.4xlarge",
      "c5ad.xlarge",
      "c5ad.2xlarge",
      "c5ad.4xlarge",
      "c5ad.8xlarge",
      "c5ad.12xlarge",
      "c5ad.16xlarge",
      "i3.large",
      "i3.xlarge",
      "i3.2xlarge",
      "i3.4xlarge",
      "i3.8xlarge",
      "i3.16xlarge"
    ],
    "defaultValue": "m5a.large"
  },
  "driver_node_type_id": {
    "type": "allowlist",
    "values": [
      "m5a.large",
      "m5a.xlarge",
      "m5a.2xlarge",
      "m5d.large",
      "m5d.xlarge",
      "m5d.2xlarge",
      "m5d.4xlarge",
      "m5d.8xlarge",
      "r5a.large",
      "r5a.xlarge",
      "r5a.2xlarge",
      "r5a.8xlarge",
      "c5a.xlarge",
      "c5a.2xlarge",
      "c5a.4xlarge",
      "c5ad.xlarge",
      "c5ad.2xlarge",
      "c5ad.4xlarge",
      "c5ad.8xlarge",
      "c5ad.12xlarge",
      "c5ad.16xlarge",
      "i3.large",
      "i3.xlarge",
      "i3.2xlarge",
      "i3.4xlarge",
      "i3.8xlarge",
      "i3.16xlarge"
    ],
    "defaultValue": "m5a.large"
  },
  "num_workers": {
    "type": "range",
    "maxValue": 12,
    "defaultValue": 2,
    "isOptional": true
  },
  "autoscale.min_workers": {
    "type": "range",
    "minValue": 1,
    "maxValue": 4,
    "defaultValue": 1
  },
  "autoscale.max_workers": {
    "type": "range",
    "maxValue": 16,
    "defaultValue": 1
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "driver_instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "cluster_type": {
    "type": "fixed",
    "value": "dlt"
  },
  "data_security_mode": {
    "type": "fixed",
    "value": "USER_ISOLATION",
    "hidden": true
  },
  "aws_attributes.availability": {
    "type": "fixed",
    "value": "SPOT_WITH_FALLBACK",
    "hidden": true
  },
  "aws_attributes.first_on_demand": {
    "type": "range",
    "minValue": 1,
    "defaultValue": 1
  },
  "aws_attributes.zone_id": {
    "type": "unlimited",
    "defaultValue": "auto",
    "hidden": true
  },
  "aws_attributes.spot_bid_price_percent": {
    "type": "fixed",
    "value": 100,
    "hidden": true
  }
}

 

 

1 REPLY 1

Renu_
Contributor

Hi @DeepankarBTo enforce compute policies for DLT pipelines, make sure your policy JSON includes policy_family_id: dlt and set apply_policy_default_values: true in the pipeline cluster settings. This helps apply the instance restrictions correctly in the UI.

And if you want to provide pre-approved compute setups for developers creating DLT pipelines in Databricks, you can create standard JSON templates with the required policy ID and settings, and save them in a shared folder like /Shared/dlt_configs/ or use the REST API to distribute them.  Developers can reuse these when creating new pipelines.

For reference: Configure compute for a DLT pipeline and Cannot select a compute policy for a DLT Pipeline

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now