Wednesday
Hello,
I am trying to create an on demand cluster in azure databricks using below code and i am getting the error message
{"error_code":"INVALID_PARAMETER_VALUE","message":"Exactly 1 of virtual_cluster_size, num_workers or autoscale must be specified.","details":[{"@type":"type.googleapis.com/google.rpc.ErrorInfo","reason":"CM_API_ERROR_SOURCE_CALLER_ERROR","domain":""}]}
I tried different settings multiple times and still getting the same error every time. Can someone please help me resolve this issue? As per my understanding it is not possible to create ondemand cluster with auto scale capabilities. Can someone please confirm if my understanding is correct?
import requests
import json
clusterConfig={
"new_cluster": {
"spark_version": "15.4.x-scala2.12",
"effective_spark_version": "15.4.x-cpu-ml-scala2.12",
"node_type_id": "Standard_D14_v2",
"spark_conf": {
"spark.databricks.delta.preview.enabled": True
},
"custom_tags": {
"ResourceClass": "SingleNode"
},
"azure_attributes": {
"first_on_demand": 1,
"availability": "ON_DEMAND_AZURE",
"spot_bid_max_price": "-1"
},
"enable_elastic_disk": True,
"num_workers": 2,
"autotermination_minutes": 10
}
}
# Initialize the DatabricksAPI with your workspace URL and token
workspaceUrl = "https://###########.azuredatabricks.net"
databricksToken = dbutils.secrets.get(scope="##############", key="dbx-token")
# Headers for the API request
headers = {
"Authorization": f"Bearer {databricksToken}",
"Content-Type": "application/json"
}
try:
# Send the API request to create the cluster
response = requests.post(
f"{workspaceUrl}/api/2.0/clusters/create",
headers=headers,
data=json.dumps(clusterConfig)
)
if response.status_code == 200:
# Extract the cluster_id from the response
cluster_data = response.json()
cluster_id = cluster_data["cluster_id"]
print(f"Cluster created successfully! Cluster ID: {cluster_id}")
else:
print(f"Error creating cluster: {response.status_code}, {response.text}")
except Exception as e:
print("ErrorMessage:" + str(e))
Thanks
Thursday - last edited Thursday
Can you try with the following:
clusterConfig = {
"cluster_name": "cluster-name",
"spark_version": "14.3.x-scala2.12",
"node_type_id": "Standard_D14_v2",
"azure_attributes": {
"availability": "ON_DEMAND_AZURE"
},
"autoscale": {
"min_workers": 5,
"max_workers": 15
}
}
I tested internally and worked for me, seems that you are using the job cluster creation mechanism to create an all purpose cluster.
Thursday
You cannot specify both num_workers
and autoscale
simultaneously. To resolve the issue, you should remove the autoscale
parameter if you want to use a fixed number of workers.
Thursday
Thanks @Walter_C for the reply. I tried what you suggested already and it is failing with same issue.
Thursday - last edited Thursday
@vivek_cloudde thanks for your question!
You can absolutely create an on-demand cluster with auto-scaling on Azure—it’s not blocked. You just need to remove any conflicting parameters so the API sees only one setting for worker configuration.
You’re hitting the error because Databricks expects exactly one of these in your cluster config: num_workers, or
autoscale, or virtual_cluster_size (rarely used). So, you can have an on-demand cluster ("availability": "ON_DEMAND_AZURE") with autoscaling by specifying:
"autoscale": {
"min_workers": 2,
"max_workers": 8
}
Although in this case, the error message could be confusing. Now, coming back to your JSON file, nothing in this snippet explicitly sets autoscale or min_workers field, it only has "num_workers": 2, so the error suggests something else is adding or conflicting with autoscale / virtual_cluster_size. Is this the actual JSON going to the API endpoint?
The only unusual thing I can spot on this JSON, is that you have "SingleNode" in custom_tags, but you're also specifying num_workers. I'm not very confident this would work, a SingleNode, usually doesn't have workers? or num_workers would be 0 ? But probably even setting num_workers regardless of its value the assertion would still fail to validate? Please try removing num_workers first.
Can you please try with the following, or something simpler, and start building it up while fixing the issues along the way:
clusterConfig={
"new_cluster": {
"spark_version": "15.4.x-scala2.12",
"node_type_id": "Standard_D14_v2",
"spark_conf": {
"spark.databricks.cluster.profile": "singleNode",
"spark.master": "local[*]",
"spark.databricks.delta.preview.enabled": true
},
"custom_tags": {
"ResourceClass": "SingleNode"
},
"azure_attributes": {
"first_on_demand": 1,
"availability": "ON_DEMAND_AZURE",
"spot_bid_max_price": "-1"
},
"enable_elastic_disk": true,
"num_workers": 0,
"autotermination_minutes": 10
}
}
Thursday
Thankyou @VZLA for the detailed email.
I tried to run using the configuration you suggested for single node but got the same error. I tried with different configurations but all giving the same error. Even a simple configuration like this is also failing with same error.
clusterConfig={
"new_cluster": {
"spark_version": "15.4.x-scala2.12",
"effective_spark_version": "15.4.x-cpu-ml-scala2.12",
"node_type_id": "Standard_D14_v2",
"azure_attributes": {
"availability": "ON_DEMAND_AZURE"
},
"autoscale": {
"min_workers": 5,
"max_workers": 15
}
}
}
Thanks
Thursday - last edited Thursday
Can you try with the following:
clusterConfig = {
"cluster_name": "cluster-name",
"spark_version": "14.3.x-scala2.12",
"node_type_id": "Standard_D14_v2",
"azure_attributes": {
"availability": "ON_DEMAND_AZURE"
},
"autoscale": {
"min_workers": 5,
"max_workers": 15
}
}
I tested internally and worked for me, seems that you are using the job cluster creation mechanism to create an all purpose cluster.
Thursday
Thank you so much @Walter_C it worked.
Thursday
Glad to hear it worked
yesterday
@vivek_cloudde I still find it interesting to know that for all these different misconfigurations or wrong cluster definitions, you got the same error message, but anyways, happy to hear it worked !
If it helps, next time and to make things simpler, you may attempt once filling out the create compute UI and then copying the generated JSON definition from the UI.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group