Re: Policy for DLT

ankit001mittal · ‎06-04-2025

Hi,
I am trying to define a policy for our DLT pipelines and I would like to provide a specific spark version like in the below example:

{
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_version": {
    "type": "allowlist",
    "values": [
      "14.3.x-scala2.12"
    ]
  },
  "node_type_id": {
    "type": "unlimited",
    "defaultValue": "Standard_DS3_v2",
    "isOptional": true
  },
  "num_workers": {
    "type": "unlimited",
    "defaultValue": 4,
    "isOptional": true
  },
  "azure_attributes.availability": {
    "type": "unlimited",
    "defaultValue": "SPOT_WITH_FALLBACK_AZURE"
  },
  "azure_attributes.spot_bid_max_price": {
    "type": "fixed",
    "value": 100,
    "hidden": true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "driver_instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "cluster_type": {
    "type": "fixed",
    "value": "dlt"
  }
}

But I am getting this error in my pipeline:

INVALID_PARAMETER_VALUE: [DLT ERROR CODE: INVALID_CLUSTER_SETTING.CLIENT_ERROR] The cluster policy specified in the pipeline settings is not compatible with Delta Live Tables. Remove 'spark_version’ from your cluster policy.

Could you please help me with it?

lingareddy_Alva · ‎06-04-2025

Hi @ankit001mittal

The error you're encountering is because Delta Live Tables (DLT) has specific requirements and automatically manages certain cluster configurations, including the Spark version. DLT pipelines are designed to use optimized Spark versions that are compatible with the DLT runtime, and allowing users to specify custom Spark versions can lead to compatibility issues.
Here's how to fix your cluster policy for DLT pipelines:

Remove the spark_version constraint from your policy:

{
"spark_conf.spark.databricks.cluster.profile": {
"type": "forbidden",
"hidden": true
},
"node_type_id": {
"type": "unlimited",
"defaultValue": "Standard_DS3_v2",
"isOptional": true
},
"num_workers": {
"type": "unlimited",
"defaultValue": 4,
"isOptional": true
},
"azure_attributes.availability": {
"type": "unlimited",
"defaultValue": "SPOT_WITH_FALLBACK_AZURE"
},
"azure_attributes.spot_bid_max_price": {
"type": "fixed",
"value": 100,
"hidden": true
},
"instance_pool_id": {
"type": "forbidden",
"hidden": true
},
"driver_instance_pool_id": {
"type": "forbidden",
"hidden": true
},
"cluster_type": {
"type": "fixed",
"value": "dlt"
}
}

Why this happens:
1. DLT Runtime Management: DLT automatically selects and manages the appropriate Spark version based on the DLT runtime version and channel (current/preview) you're using
2. Compatibility: DLT includes specific optimizations and features that require particular Spark versions
3. Automatic Updates: DLT handles Spark version updates as part of its managed service approach

Alternative approaches if you need version control:
1. Use DLT Runtime Channels: Instead of specifying Spark versions, you can control which DLT runtime channel your pipeline uses (current vs preview) in the pipeline configuration
2. Separate Policies: Consider having separate cluster policies - one for DLT pipelines (without spark_version) and another for regular clusters (with spark_version constraints)
3. Pipeline-Level Configuration: Set any specific runtime requirements at the pipeline level rather than the cluster policy level

LR