cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Is there a way to use cluster policies within jobs api to define cluster configuration rather than in the jobs api itself?

Colter
New Contributor II

I want to create a cluster policy that is referenced by most of our repos/jobs so we have one place to update whenever there is a spark version change or when we need to add additional spark configurations. I figured cluster policies might be a good way to do this as creating a separate library would necessitate the pipeline to be triggered when we make edits to the library for the changes to take effect on the actual job.

Here is the cluster policy I would like to use:

{

 "spark_conf.spark.databricks.cluster.profile": {

  "type": "fixed",

  "value": "singleNode",

  "hidden": true

 },

 "spark_conf.spark.master": {

  "type": "fixed",

  "value": "local[*, 4]",

  "hidden": true

 },

 "spark_conf.spark.databricks.dataLineage.enabled": {

  "type": "fixed",

  "value": "true",

  "hidden": true

 },

 "cluster_type": {

  "type": "fixed",

  "value": "job"

 },

 "spark_version": {

  "type": "fixed",

  "value": "11.3.x-scala2.12",

  "hidden": true

 },

 "node_type_id": {

  "type": "fixed",

  "value": "i3.xlarge",

  "hidden": true

 },

 "custom_tags.ResourceClass": {

  "type": "fixed",

  "value": "singleNode",

  "hidden": true

 },

 "aws_attributes.availability": {

  "type": "fixed",

  "value": "SPOT_WITH_FALLBACK",

  "hidden": true

 },

 "aws_attributes.first_on_demand": {

  "type": "fixed",

  "value": 1,

  "hidden": true

 },

 "aws_attributes.zone_id": {

  "type": "fixed",

  "value": "auto",

  "hidden": true

 },

 "aws_attributes.spot_bid_price_percent": {

  "type": "fixed",

  "value": 100,

  "hidden": true

 }

}

Here is the JSON for the Jobs API with how I want to use cluster policies

json={

  "job_id": job_id[0],

  "new_settings": {           

  "name": job_config['job_name'],                      

  "new_cluster": {

 "cluster_policy_id": "<cluster_policy_id>",

 "spark_conf": {

 "spark.databricks.sql.initial.catalog.name": default_catalog

 },

},

  }

}

Has anyone succeed in utilizing cluster policies with Jobs API to specify cluster parameters instead of specifying cluster parameters in the API itself?

3 REPLIES 3

Anonymous
Not applicable

@Colter Nattrass​ :

Yes, it is possible to use cluster policies within Jobs API to define cluster configuration rather than in the Jobs API itself. To do this, you can reference the cluster policy ID in the new_cluster section of the Jobs API request instead of defining the cluster configuration directly. Here is an example:

json={
  "job_id": job_id[0],
  "new_settings": {
    "name": job_config['job_name'],
    "new_cluster": {
      "cluster_policy_id": "<cluster_policy_id>"
    }
  }
}
 

In the cluster_policy_id field, replace <cluster_policy_id> with the actual ID of your cluster policy. This will apply the cluster policy's configuration settings to the cluster used by the job.

Note that some configuration settings cannot be set via cluster policies and must be set directly in the Jobs API request. For example, if you need to specify a specific version of a library for your job, you will need to specify that in the Jobs API request rather than in the cluster policy.

Colter
New Contributor II

The last paragraph reinforced issues I was seeing with the expected behavior of cluster config in jobs api vs policy s actual behavior. I was trying to specify the spark version in the cluster policy and omit it from the JobsAPI, this didnt work.

Although I will say it didn't seem like any configurations were passed into api-created-cluster Not even my spark configurations were passed in

Anonymous
Not applicable

Hi @Colter Nattrass​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.