cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Issue while creating on-demand cluster in azure databricks using pyspark

vivek_cloudde
New Contributor II

Hello,

I am trying to create an on demand cluster in azure databricks using below code and i am getting the error message
{"error_code":"INVALID_PARAMETER_VALUE","message":"Exactly 1 of virtual_cluster_size, num_workers or autoscale must be specified.","details":[{"@type":"type.googleapis.com/google.rpc.ErrorInfo","reason":"CM_API_ERROR_SOURCE_CALLER_ERROR","domain":""}]}
I tried different settings multiple times and still getting the same error every time. Can someone please help me resolve this issue? As per my understanding it is not possible to create ondemand cluster with auto scale capabilities. Can someone please confirm if my understanding is correct?

import requests
import json

clusterConfig={
"new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "effective_spark_version": "15.4.x-cpu-ml-scala2.12",
        "node_type_id": "Standard_D14_v2",
        "spark_conf": {
            "spark.databricks.delta.preview.enabled": True
        },
        "custom_tags": {
            "ResourceClass": "SingleNode"
        },
        "azure_attributes": {
            "first_on_demand": 1,
            "availability": "ON_DEMAND_AZURE",
            "spot_bid_max_price": "-1"
        },
        "enable_elastic_disk": True,
        "num_workers": 2,
        "autotermination_minutes": 10
    }
}

# Initialize the DatabricksAPI with your workspace URL and token
workspaceUrl = "https://###########.azuredatabricks.net"
databricksToken = dbutils.secrets.get(scope="##############", key="dbx-token")

# Headers for the API request
headers = {
    "Authorization": f"Bearer {databricksToken}",
    "Content-Type": "application/json"
}

try:
  # Send the API request to create the cluster
  response = requests.post(
      f"{workspaceUrl}/api/2.0/clusters/create",
      headers=headers,
      data=json.dumps(clusterConfig)
  )

  if response.status_code == 200:
    # Extract the cluster_id from the response
    cluster_data = response.json()
    cluster_id = cluster_data["cluster_id"]
    print(f"Cluster created successfully! Cluster ID: {cluster_id}")
  else:
    print(f"Error creating cluster: {response.status_code}, {response.text}")
except Exception as e:
  print("ErrorMessage:" + str(e))

 Thanks 

1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

Can you try with the following:

 

clusterConfig = {
  "cluster_name": "cluster-name",
  "spark_version": "14.3.x-scala2.12",
  "node_type_id": "Standard_D14_v2",
  "azure_attributes": {
    "availability": "ON_DEMAND_AZURE"
  },
  "autoscale": {
    "min_workers": 5,
    "max_workers": 15
  }
}

 

I tested internally and worked for me, seems that you are using the job cluster creation mechanism to create an all purpose cluster.

View solution in original post

8 REPLIES 8

Walter_C
Databricks Employee
Databricks Employee

You cannot specify both num_workers and autoscale simultaneously. To resolve the issue, you should remove the autoscale parameter if you want to use a fixed number of workers.

Thanks @Walter_C for the reply. I tried what you suggested already and it is failing with same issue.

VZLA
Databricks Employee
Databricks Employee

@vivek_cloudde thanks for your question!

You can absolutely create an on-demand cluster with auto-scaling on Azureโ€”itโ€™s not blocked. You just need to remove any conflicting parameters so the API sees only one setting for worker configuration.

Youโ€™re hitting the error because Databricks expects exactly one of these in your cluster config: num_workers, or
autoscale, or virtual_cluster_size (rarely used). So, you can have an on-demand cluster ("availability": "ON_DEMAND_AZURE") with autoscaling by specifying:

 

"autoscale": {
  "min_workers": 2,
  "max_workers": 8
}

 

Although in this case, the error message could be confusing. Now, coming back to your JSON file, nothing in this snippet explicitly sets autoscale or min_workers field, it only has "num_workers": 2, so the error suggests something else is adding or conflicting with autoscale / virtual_cluster_size. Is this the actual JSON going to the API endpoint?

The only unusual thing I can spot on this JSON, is that you have "SingleNode" in custom_tags, but you're also specifying num_workers. I'm not very confident this would work, a SingleNode, usually doesn't have workers? or num_workers would be 0 ? But probably even setting num_workers regardless of its value the assertion would still fail to validate? Please try removing num_workers first.

Can you please try with the following, or something simpler, and start building it up while fixing the issues along the way:

 

clusterConfig={
    "new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "node_type_id": "Standard_D14_v2",
        "spark_conf": {
            "spark.databricks.cluster.profile": "singleNode",
            "spark.master": "local[*]",
            "spark.databricks.delta.preview.enabled": true
        },
        "custom_tags": {
            "ResourceClass": "SingleNode"
        },
        "azure_attributes": {
            "first_on_demand": 1,
            "availability": "ON_DEMAND_AZURE",
            "spot_bid_max_price": "-1"
        },
        "enable_elastic_disk": true,
        "num_workers": 0,
        "autotermination_minutes": 10
    }
}

 

 

vivek_cloudde
New Contributor II

Thankyou @VZLA for the detailed email.
I tried to run using the configuration you suggested for single node but got the same error. I tried with different configurations but all giving the same error. Even a simple configuration like this is also failing with same error.

clusterConfig={
    "new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "effective_spark_version": "15.4.x-cpu-ml-scala2.12",
        "node_type_id": "Standard_D14_v2",
        "azure_attributes": {
            "availability": "ON_DEMAND_AZURE"
        },
        "autoscale": {
            "min_workers": 5,
            "max_workers": 15
        }
    }
}

 Thanks

Walter_C
Databricks Employee
Databricks Employee

Can you try with the following:

 

clusterConfig = {
  "cluster_name": "cluster-name",
  "spark_version": "14.3.x-scala2.12",
  "node_type_id": "Standard_D14_v2",
  "azure_attributes": {
    "availability": "ON_DEMAND_AZURE"
  },
  "autoscale": {
    "min_workers": 5,
    "max_workers": 15
  }
}

 

I tested internally and worked for me, seems that you are using the job cluster creation mechanism to create an all purpose cluster.

vivek_cloudde
New Contributor II

Thank you so much @Walter_C it worked.

Walter_C
Databricks Employee
Databricks Employee

Glad to hear it worked

VZLA
Databricks Employee
Databricks Employee

@vivek_cloudde I still find it interesting to know that for all these different misconfigurations or wrong cluster definitions, you got the same error message, but anyways, happy to hear it worked !

If it helps, next time and to make things simpler, you may attempt once filling out the create compute UI and then copying the generated JSON definition from the UI.

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group