topic Re: Issue while creating on-demand cluster in azure databricks using pyspark in Data Engineering

Issue while creating on-demand cluster in azure databricks using pyspark

vivek_cloudde — Thu, 02 Jan 2025 02:19:15 GMT

Hello,

I am trying to create an on demand cluster in azure databricks using below code and i am getting the error message
{"error_code":"INVALID_PARAMETER_VALUE","message":"Exactly 1 of virtual_cluster_size, num_workers or autoscale must be specified.","details":[{"@type":"type.googleapis.com/google.rpc.ErrorInfo","reason":"CM_API_ERROR_SOURCE_CALLER_ERROR","domain":""}]}
I tried different settings multiple times and still getting the same error every time. Can someone please help me resolve this issue? As per my understanding it is not possible to create ondemand cluster with auto scale capabilities. Can someone please confirm if my understanding is correct?

import requests import json clusterConfig={ "new_cluster": { "spark_version": "15.4.x-scala2.12", "effective_spark_version": "15.4.x-cpu-ml-scala2.12", "node_type_id": "Standard_D14_v2", "spark_conf": { "spark.databricks.delta.preview.enabled": True }, "custom_tags": { "ResourceClass": "SingleNode" }, "azure_attributes": { "first_on_demand": 1, "availability": "ON_DEMAND_AZURE", "spot_bid_max_price": "-1" }, "enable_elastic_disk": True, "num_workers": 2, "autotermination_minutes": 10 } } # Initialize the DatabricksAPI with your workspace URL and token workspaceUrl = "https://###########.azuredatabricks.net" databricksToken = dbutils.secrets.get(scope="##############", key="dbx-token") # Headers for the API request headers = { "Authorization": f"Bearer {databricksToken}", "Content-Type": "application/json" } try: # Send the API request to create the cluster response = requests.post( f"{workspaceUrl}/api/2.0/clusters/create", headers=headers, data=json.dumps(clusterConfig) ) if response.status_code == 200: # Extract the cluster_id from the response cluster_data = response.json() cluster_id = cluster_data["cluster_id"] print(f"Cluster created successfully! Cluster ID: {cluster_id}") else: print(f"Error creating cluster: {response.status_code}, {response.text}") except Exception as e: print("ErrorMessage:" + str(e))

Thanks

Re: Issue while creating on-demand cluster in azure databricks using pyspark

Walter_C — Thu, 02 Jan 2025 12:08:08 GMT

You cannot specify both num_workers and autoscale simultaneously. To resolve the issue, you should remove the autoscale parameter if you want to use a fixed number of workers.

Re: Issue while creating on-demand cluster in azure databricks using pyspark

VZLA — Thu, 02 Jan 2025 18:10:04 GMT

@vivek_cloudde thanks for your question!

You can absolutely create an on-demand cluster with auto-scaling on Azure—it’s not blocked. You just need to remove any conflicting parameters so the API sees only one setting for worker configuration.

You’re hitting the error because Databricks expects exactly one of these in your cluster config: num_workers, or
autoscale, or virtual_cluster_size (rarely used). So, you can have an on-demand cluster ("availability": "ON_DEMAND_AZURE") with autoscaling by specifying:

"autoscale": { "min_workers": 2, "max_workers": 8 }

Although in this case, the error message could be confusing. Now, coming back to your JSON file, nothing in this snippet explicitly sets autoscale or min_workers field, it only has "num_workers": 2, so the error suggests something else is adding or conflicting with autoscale / virtual_cluster_size. Is this the actual JSON going to the API endpoint?

The only unusual thing I can spot on this JSON, is that you have "SingleNode" in custom_tags, but you're also specifying num_workers. I'm not very confident this would work, a SingleNode, usually doesn't have workers? or num_workers would be 0 ? But probably even setting num_workers regardless of its value the assertion would still fail to validate? Please try removing num_workers first.

Can you please try with the following, or something simpler, and start building it up while fixing the issues along the way:

clusterConfig={ "new_cluster": { "spark_version": "15.4.x-scala2.12", "node_type_id": "Standard_D14_v2", "spark_conf": { "spark.databricks.cluster.profile": "singleNode", "spark.master": "local[*]", "spark.databricks.delta.preview.enabled": true }, "custom_tags": { "ResourceClass": "SingleNode" }, "azure_attributes": { "first_on_demand": 1, "availability": "ON_DEMAND_AZURE", "spot_bid_max_price": "-1" }, "enable_elastic_disk": true, "num_workers": 0, "autotermination_minutes": 10 } }

Re: Issue while creating on-demand cluster in azure databricks using pyspark

vivek_cloudde — Thu, 02 Jan 2025 19:42:25 GMT

Thankyou @VZLA for the detailed email.
I tried to run using the configuration you suggested for single node but got the same error. I tried with different configurations but all giving the same error. Even a simple configuration like this is also failing with same error.

clusterConfig={ "new_cluster": { "spark_version": "15.4.x-scala2.12", "effective_spark_version": "15.4.x-cpu-ml-scala2.12", "node_type_id": "Standard_D14_v2", "azure_attributes": { "availability": "ON_DEMAND_AZURE" }, "autoscale": { "min_workers": 5, "max_workers": 15 } } }

Thanks

Re: Issue while creating on-demand cluster in azure databricks using pyspark

vivek_cloudde — Thu, 02 Jan 2025 19:44:51 GMT

Thanks @Walter_C for the reply. I tried what you suggested already and it is failing with same issue.

Re: Issue while creating on-demand cluster in azure databricks using pyspark

Walter_C — Thu, 02 Jan 2025 20:52:53 GMT

Can you try with the following:

clusterConfig = { "cluster_name": "cluster-name", "spark_version": "14.3.x-scala2.12", "node_type_id": "Standard_D14_v2", "azure_attributes": { "availability": "ON_DEMAND_AZURE" }, "autoscale": { "min_workers": 5, "max_workers": 15 } }

I tested internally and worked for me, seems that you are using the job cluster creation mechanism to create an all purpose cluster.

Re: Issue while creating on-demand cluster in azure databricks using pyspark

vivek_cloudde — Thu, 02 Jan 2025 21:55:51 GMT

Thank you so much @Walter_C it worked.

Re: Issue while creating on-demand cluster in azure databricks using pyspark

Walter_C — Fri, 03 Jan 2025 00:39:21 GMT

Glad to hear it worked

Re: Issue while creating on-demand cluster in azure databricks using pyspark

VZLA — Fri, 03 Jan 2025 10:36:04 GMT

@vivek_cloudde I still find it interesting to know that for all these different misconfigurations or wrong cluster definitions, you got the same error message, but anyways, happy to hear it worked !

If it helps, next time and to make things simpler, you may attempt once filling out the create compute UI and then copying the generated JSON definition from the UI.