I want to create a cluster policy that is referenced by most of our repos/jobs so we have one place to update whenever there is a spark version change or when we need to add additional spark configurations. I figured cluster policies might be a good way to do this as creating a separate library would necessitate the pipeline to be triggered when we make edits to the library for the changes to take effect on the actual job.
Here is the cluster policy I would like to use:
{
"spark_conf.spark.databricks.cluster.profile": {
"type": "fixed",
"value": "singleNode",
"hidden": true
},
"spark_conf.spark.master": {
"type": "fixed",
"value": "local[*, 4]",
"hidden": true
},
"spark_conf.spark.databricks.dataLineage.enabled": {
"type": "fixed",
"value": "true",
"hidden": true
},
"cluster_type": {
"type": "fixed",
"value": "job"
},
"spark_version": {
"type": "fixed",
"value": "11.3.x-scala2.12",
"hidden": true
},
"node_type_id": {
"type": "fixed",
"value": "i3.xlarge",
"hidden": true
},
"custom_tags.ResourceClass": {
"type": "fixed",
"value": "singleNode",
"hidden": true
},
"aws_attributes.availability": {
"type": "fixed",
"value": "SPOT_WITH_FALLBACK",
"hidden": true
},
"aws_attributes.first_on_demand": {
"type": "fixed",
"value": 1,
"hidden": true
},
"aws_attributes.zone_id": {
"type": "fixed",
"value": "auto",
"hidden": true
},
"aws_attributes.spot_bid_price_percent": {
"type": "fixed",
"value": 100,
"hidden": true
}
}
Here is the JSON for the Jobs API with how I want to use cluster policies
json={
"job_id": job_id[0],
"new_settings": {
"name": job_config['job_name'],
"new_cluster": {
"cluster_policy_id": "<cluster_policy_id>",
"spark_conf": {
"spark.databricks.sql.initial.catalog.name": default_catalog
},
},
}
}
Has anyone succeed in utilizing cluster policies with Jobs API to specify cluster parameters instead of specifying cluster parameters in the API itself?