cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Policy for DLT

ankit001mittal
New Contributor III

Hi,
I am trying to define a policy for our DLT pipelines and I would like to provide a specific spark version like in the below example: 

 

{
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_version": {
    "type": "allowlist",
    "values": [
      "14.3.x-scala2.12"
    ]
  },
  "node_type_id": {
    "type": "unlimited",
    "defaultValue": "Standard_DS3_v2",
    "isOptional": true
  },
  "num_workers": {
    "type": "unlimited",
    "defaultValue": 4,
    "isOptional": true
  },
  "azure_attributes.availability": {
    "type": "unlimited",
    "defaultValue": "SPOT_WITH_FALLBACK_AZURE"
  },
  "azure_attributes.spot_bid_max_price": {
    "type": "fixed",
    "value": 100,
    "hidden": true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "driver_instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "cluster_type": {
    "type": "fixed",
    "value": "dlt"
  }
}

 

 But I am getting this error in my pipeline:

INVALID_PARAMETER_VALUE: [DLT ERROR CODE: INVALID_CLUSTER_SETTING.CLIENT_ERROR] The cluster policy specified in the pipeline settings is not compatible with Delta Live Tables. Remove 'spark_version’ from your cluster policy.

Could you please help me with it?

1 REPLY 1

lingareddy_Alva
Honored Contributor II

Hi @ankit001mittal 

The error you're encountering is because Delta Live Tables (DLT) has specific requirements and automatically manages certain cluster configurations, including the Spark version. DLT pipelines are designed to use optimized Spark versions that are compatible with the DLT runtime, and allowing users to specify custom Spark versions can lead to compatibility issues.
Here's how to fix your cluster policy for DLT pipelines:

Remove the spark_version constraint from your policy:

{
"spark_conf.spark.databricks.cluster.profile": {
"type": "forbidden",
"hidden": true
},
"node_type_id": {
"type": "unlimited",
"defaultValue": "Standard_DS3_v2",
"isOptional": true
},
"num_workers": {
"type": "unlimited",
"defaultValue": 4,
"isOptional": true
},
"azure_attributes.availability": {
"type": "unlimited",
"defaultValue": "SPOT_WITH_FALLBACK_AZURE"
},
"azure_attributes.spot_bid_max_price": {
"type": "fixed",
"value": 100,
"hidden": true
},
"instance_pool_id": {
"type": "forbidden",
"hidden": true
},
"driver_instance_pool_id": {
"type": "forbidden",
"hidden": true
},
"cluster_type": {
"type": "fixed",
"value": "dlt"
}
}


Why this happens:
1. DLT Runtime Management: DLT automatically selects and manages the appropriate Spark version based on the DLT runtime version and channel (current/preview) you're using
2. Compatibility: DLT includes specific optimizations and features that require particular Spark versions
3. Automatic Updates: DLT handles Spark version updates as part of its managed service approach


Alternative approaches if you need version control:
1. Use DLT Runtime Channels: Instead of specifying Spark versions, you can control which DLT runtime channel your pipeline uses (current vs preview) in the pipeline configuration
2. Separate Policies: Consider having separate cluster policies - one for DLT pipelines (without spark_version) and another for regular clusters (with spark_version constraints)
3. Pipeline-Level Configuration: Set any specific runtime requirements at the pipeline level rather than the cluster policy level

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now