Databricks Community

vivek_cloudde · ‎01-01-2025

Hello,

I am trying to create an on demand cluster in azure databricks using below code and i am getting the error message
{"error_code":"INVALID_PARAMETER_VALUE","message":"Exactly 1 of virtual_cluster_size, num_workers or autoscale must be specified.","details":[{"@type":"type.googleapis.com/google.rpc.ErrorInfo","reason":"CM_API_ERROR_SOURCE_CALLER_ERROR","domain":""}]}
I tried different settings multiple times and still getting the same error every time. Can someone please help me resolve this issue? As per my understanding it is not possible to create ondemand cluster with auto scale capabilities. Can someone please confirm if my understanding is correct?

import requests
import json

clusterConfig={
"new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "effective_spark_version": "15.4.x-cpu-ml-scala2.12",
        "node_type_id": "Standard_D14_v2",
        "spark_conf": {
            "spark.databricks.delta.preview.enabled": True
        },
        "custom_tags": {
            "ResourceClass": "SingleNode"
        },
        "azure_attributes": {
            "first_on_demand": 1,
            "availability": "ON_DEMAND_AZURE",
            "spot_bid_max_price": "-1"
        },
        "enable_elastic_disk": True,
        "num_workers": 2,
        "autotermination_minutes": 10
    }
}

# Initialize the DatabricksAPI with your workspace URL and token
workspaceUrl = "https://###########.azuredatabricks.net"
databricksToken = dbutils.secrets.get(scope="##############", key="dbx-token")

# Headers for the API request
headers = {
    "Authorization": f"Bearer {databricksToken}",
    "Content-Type": "application/json"
}

try:
  # Send the API request to create the cluster
  response = requests.post(
      f"{workspaceUrl}/api/2.0/clusters/create",
      headers=headers,
      data=json.dumps(clusterConfig)
  )

  if response.status_code == 200:
    # Extract the cluster_id from the response
    cluster_data = response.json()
    cluster_id = cluster_data["cluster_id"]
    print(f"Cluster created successfully! Cluster ID: {cluster_id}")
  else:
    print(f"Error creating cluster: {response.status_code}, {response.text}")
except Exception as e:
  print("ErrorMessage:" + str(e))

Thanks

Walter_C · ‎01-02-2025

Can you try with the following:

clusterConfig = {
  "cluster_name": "cluster-name",
  "spark_version": "14.3.x-scala2.12",
  "node_type_id": "Standard_D14_v2",
  "azure_attributes": {
    "availability": "ON_DEMAND_AZURE"
  },
  "autoscale": {
    "min_workers": 5,
    "max_workers": 15
  }
}

I tested internally and worked for me, seems that you are using the job cluster creation mechanism to create an all purpose cluster.

View solution in original post

Walter_C · ‎01-02-2025

You cannot specify both num_workers and autoscale simultaneously. To resolve the issue, you should remove the autoscale parameter if you want to use a fixed number of workers.

vivek_cloudde · ‎01-02-2025

Thanks @Walter_C for the reply. I tried what you suggested already and it is failing with same issue.

VZLA · ‎01-02-2025

@vivek_cloudde thanks for your question!

You can absolutely create an on-demand cluster with auto-scaling on Azure—it’s not blocked. You just need to remove any conflicting parameters so the API sees only one setting for worker configuration.

You’re hitting the error because Databricks expects exactly one of these in your cluster config: num_workers, or
autoscale, or virtual_cluster_size (rarely used). So, you can have an on-demand cluster ("availability": "ON_DEMAND_AZURE") with autoscaling by specifying:

"autoscale": {
  "min_workers": 2,
  "max_workers": 8
}

Although in this case, the error message could be confusing. Now, coming back to your JSON file, nothing in this snippet explicitly sets autoscale or min_workers field, it only has "num_workers": 2, so the error suggests something else is adding or conflicting with autoscale / virtual_cluster_size. Is this the actual JSON going to the API endpoint?

The only unusual thing I can spot on this JSON, is that you have "SingleNode" in custom_tags, but you're also specifying num_workers. I'm not very confident this would work, a SingleNode, usually doesn't have workers? or num_workers would be 0 ? But probably even setting num_workers regardless of its value the assertion would still fail to validate? Please try removing num_workers first.

Can you please try with the following, or something simpler, and start building it up while fixing the issues along the way:

clusterConfig={
    "new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "node_type_id": "Standard_D14_v2",
        "spark_conf": {
            "spark.databricks.cluster.profile": "singleNode",
            "spark.master": "local[*]",
            "spark.databricks.delta.preview.enabled": true
        },
        "custom_tags": {
            "ResourceClass": "SingleNode"
        },
        "azure_attributes": {
            "first_on_demand": 1,
            "availability": "ON_DEMAND_AZURE",
            "spot_bid_max_price": "-1"
        },
        "enable_elastic_disk": true,
        "num_workers": 0,
        "autotermination_minutes": 10
    }
}

vivek_cloudde · ‎01-02-2025

Thankyou @VZLA for the detailed email.
I tried to run using the configuration you suggested for single node but got the same error. I tried with different configurations but all giving the same error. Even a simple configuration like this is also failing with same error.

clusterConfig={
    "new_cluster": {
        "spark_version": "15.4.x-scala2.12",
        "effective_spark_version": "15.4.x-cpu-ml-scala2.12",
        "node_type_id": "Standard_D14_v2",
        "azure_attributes": {
            "availability": "ON_DEMAND_AZURE"
        },
        "autoscale": {
            "min_workers": 5,
            "max_workers": 15
        }
    }
}

Thanks

Walter_C · ‎01-02-2025

Can you try with the following:

clusterConfig = {
  "cluster_name": "cluster-name",
  "spark_version": "14.3.x-scala2.12",
  "node_type_id": "Standard_D14_v2",
  "azure_attributes": {
    "availability": "ON_DEMAND_AZURE"
  },
  "autoscale": {
    "min_workers": 5,
    "max_workers": 15
  }
}

I tested internally and worked for me, seems that you are using the job cluster creation mechanism to create an all purpose cluster.

vivek_cloudde · ‎01-02-2025

Thank you so much @Walter_C it worked.

Walter_C · ‎01-02-2025

Glad to hear it worked

VZLA · ‎01-03-2025

@vivek_cloudde I still find it interesting to know that for all these different misconfigurations or wrong cluster definitions, you got the same error message, but anyways, happy to hear it worked !

If it helps, next time and to make things simpler, you may attempt once filling out the create compute UI and then copying the generated JSON definition from the UI.

Databricks Community

Issue while creating on-demand cluster in azure databricks using pyspark

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!