To use the Databricks Python SDK from Azure DevOps to create or update a job and assign it explicitly to a cluster policy, specify the cluster policy by its ID in the job cluster section of your job definition. This ensures the cluster spawned for your job adheres to the constraints and settings of the specified policy.
Key Steps
-
Use the SDK and set the policy_id (sometimes called cluster_policy_id) in the job cluster's definition.
-
Reference the policy you want by its IDโnot name.
-
Minimal cluster parameters (required by the policy and Databricks) still need to be set in the job cluster.
Example Python SDK Job Definition
Below is a simplified example using pseudocode for the Databricks SDK for Python that accomplishes this:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
job_json = {
"name": "Example Job with Cluster Policy",
"tasks": [
{
"task_key": "do_something",
"notebook_task": {
"notebook_path": "/Repos/your_user/your_repo/your_notebook"
},
"job_cluster_key": "the-job-cluster"
}
],
"job_clusters": [
{
"job_cluster_key": "the-job-cluster",
"new_cluster": {
"spark_version": "13.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"policy_id": "<YOUR_POLICY_ID>",
"num_workers": 2
# Optionally, other minimal parameters
}
}
]
}
w.jobs.create(**job_json)
Replace <YOUR_POLICY_ID> with the appropriate cluster policy ID. The key field is policy_id (or cluster_policy_id depending on the SDK or API version). This applies all enforcement and default settings from the policy to the created job cluster. Refer to the [Databricks SDK documentation and examples].โ
Additional Notes
-
The cluster policy cannot be specified by nameโonly by policy ID.
-
The minimal parameters you provide (like spark_version) must align with what's allowed by the policy. The cluster launched for a job run will follow all required or default attributes set in the referenced policy.
-