cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is there a way to use cluster policies within jobs api to define cluster configuration rather than in the jobs api itself?

Colter
New Contributor II

I want to create a cluster policy that is referenced by most of our repos/jobs so we have one place to update whenever there is a spark version change or when we need to add additional spark configurations. I figured cluster policies might be a good way to do this as creating a separate library would necessitate the pipeline to be triggered when we make edits to the library for the changes to take effect on the actual job.

Here is the cluster policy I would like to use:

{

 "spark_conf.spark.databricks.cluster.profile": {

  "type": "fixed",

  "value": "singleNode",

  "hidden": true

 },

 "spark_conf.spark.master": {

  "type": "fixed",

  "value": "local[*, 4]",

  "hidden": true

 },

 "spark_conf.spark.databricks.dataLineage.enabled": {

  "type": "fixed",

  "value": "true",

  "hidden": true

 },

 "cluster_type": {

  "type": "fixed",

  "value": "job"

 },

 "spark_version": {

  "type": "fixed",

  "value": "11.3.x-scala2.12",

  "hidden": true

 },

 "node_type_id": {

  "type": "fixed",

  "value": "i3.xlarge",

  "hidden": true

 },

 "custom_tags.ResourceClass": {

  "type": "fixed",

  "value": "singleNode",

  "hidden": true

 },

 "aws_attributes.availability": {

  "type": "fixed",

  "value": "SPOT_WITH_FALLBACK",

  "hidden": true

 },

 "aws_attributes.first_on_demand": {

  "type": "fixed",

  "value": 1,

  "hidden": true

 },

 "aws_attributes.zone_id": {

  "type": "fixed",

  "value": "auto",

  "hidden": true

 },

 "aws_attributes.spot_bid_price_percent": {

  "type": "fixed",

  "value": 100,

  "hidden": true

 }

}

Here is the JSON for the Jobs API with how I want to use cluster policies

json={

  "job_id": job_id[0],

  "new_settings": {           

  "name": job_config['job_name'],                      

  "new_cluster": {

 "cluster_policy_id": "<cluster_policy_id>",

 "spark_conf": {

 "spark.databricks.sql.initial.catalog.name": default_catalog

 },

},

  }

}

Has anyone succeed in utilizing cluster policies with Jobs API to specify cluster parameters instead of specifying cluster parameters in the API itself?

3 REPLIES 3

Anonymous
Not applicable

@Colter Nattrassโ€‹ :

Yes, it is possible to use cluster policies within Jobs API to define cluster configuration rather than in the Jobs API itself. To do this, you can reference the cluster policy ID in the new_cluster section of the Jobs API request instead of defining the cluster configuration directly. Here is an example:

json={
  "job_id": job_id[0],
  "new_settings": {
    "name": job_config['job_name'],
    "new_cluster": {
      "cluster_policy_id": "<cluster_policy_id>"
    }
  }
}
 

In the cluster_policy_id field, replace <cluster_policy_id> with the actual ID of your cluster policy. This will apply the cluster policy's configuration settings to the cluster used by the job.

Note that some configuration settings cannot be set via cluster policies and must be set directly in the Jobs API request. For example, if you need to specify a specific version of a library for your job, you will need to specify that in the Jobs API request rather than in the cluster policy.

Colter
New Contributor II

The last paragraph reinforced issues I was seeing with the expected behavior of cluster config in jobs api vs policy s actual behavior. I was trying to specify the spark version in the cluster policy and omit it from the JobsAPI, this didnt work.

Although I will say it didn't seem like any configurations were passed into api-created-cluster Not even my spark configurations were passed in

Anonymous
Not applicable

Hi @Colter Nattrassโ€‹ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group