Databricks Community

Colter · ‎04-14-2023

I want to create a cluster policy that is referenced by most of our repos/jobs so we have one place to update whenever there is a spark version change or when we need to add additional spark configurations. I figured cluster policies might be a good way to do this as creating a separate library would necessitate the pipeline to be triggered when we make edits to the library for the changes to take effect on the actual job.

Here is the cluster policy I would like to use:

{

"spark_conf.spark.databricks.cluster.profile": {

"type": "fixed",

"value": "singleNode",

"hidden": true

},

"spark_conf.spark.master": {

"type": "fixed",

"value": "local[*, 4]",

"hidden": true

},

"spark_conf.spark.databricks.dataLineage.enabled": {

"type": "fixed",

"value": "true",

"hidden": true

},

"cluster_type": {

"type": "fixed",

"value": "job"

},

"spark_version": {

"type": "fixed",

"value": "11.3.x-scala2.12",

"hidden": true

},

"node_type_id": {

"type": "fixed",

"value": "i3.xlarge",

"hidden": true

},

"custom_tags.ResourceClass": {

"type": "fixed",

"value": "singleNode",

"hidden": true

},

"aws_attributes.availability": {

"type": "fixed",

"value": "SPOT_WITH_FALLBACK",

"hidden": true

},

"aws_attributes.first_on_demand": {

"type": "fixed",

"value": 1,

"hidden": true

},

"aws_attributes.zone_id": {

"type": "fixed",

"value": "auto",

"hidden": true

},

"aws_attributes.spot_bid_price_percent": {

"type": "fixed",

"value": 100,

"hidden": true

}

Here is the JSON for the Jobs API with how I want to use cluster policies

json={

"job_id": job_id[0],

"new_settings": {

"name": job_config['job_name'],

"new_cluster": {

"cluster_policy_id": "<cluster_policy_id>",

"spark_conf": {

"spark.databricks.sql.initial.catalog.name": default_catalog

},

}

Has anyone succeed in utilizing cluster policies with Jobs API to specify cluster parameters instead of specifying cluster parameters in the API itself?

Anonymous · ‎04-16-2023

@Colter Nattrass :

Yes, it is possible to use cluster policies within Jobs API to define cluster configuration rather than in the Jobs API itself. To do this, you can reference the cluster policy ID in the new_cluster section of the Jobs API request instead of defining the cluster configuration directly. Here is an example:

json={
  "job_id": job_id[0],
  "new_settings": {
    "name": job_config['job_name'],
    "new_cluster": {
      "cluster_policy_id": "<cluster_policy_id>"
    }
  }
}

In the cluster_policy_id field, replace <cluster_policy_id> with the actual ID of your cluster policy. This will apply the cluster policy's configuration settings to the cluster used by the job.

Note that some configuration settings cannot be set via cluster policies and must be set directly in the Jobs API request. For example, if you need to specify a specific version of a library for your job, you will need to specify that in the Jobs API request rather than in the cluster policy.

Colter · ‎04-19-2023

The last paragraph reinforced issues I was seeing with the expected behavior of cluster config in jobs api vs policy s actual behavior. I was trying to specify the spark version in the cluster policy and omit it from the JobsAPI, this didnt work.

Although I will say it didn't seem like any configurations were passed into api-created-cluster Not even my spark configurations were passed in

Anonymous · ‎04-17-2023

Hi @Colter Nattrass

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

Databricks Community

Is there a way to use cluster policies within jobs api to define cluster configuration rather than in the jobs api itself?

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April