Databricks

mmlime · ‎10-26-2022

Hi,

there is no option to take VMs from a Pool for a new workflow (Azure Cloud)?

default schema for a new cluster:

{
    "num_workers": 0,
    "spark_version": "10.4.x-scala2.12",
    "spark_conf": {
        "spark.master": "local[*, 4]",
        "spark.databricks.cluster.profile": "singleNode"
    },
    "azure_attributes": {
        "first_on_demand": 1,
        "availability": "ON_DEMAND_AZURE",
        "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_DS3_v2",
    "ssh_public_keys": [],
    "custom_tags": {
        "ResourceClass": "SingleNode"
    },
    "spark_env_vars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "enable_elastic_disk": true,
    "cluster_source": "JOB",
    "init_scripts": [],
    "data_security_mode": "SINGLE_USER",
    "runtime_engine": "STANDARD"
}

Vivian_Wilfred · ‎10-26-2022

@Michal Mlaka I just checked on the UI and I could find the pools listing under worker type in a job cluster configuration. It should work.

View solution in original post

Vivian_Wilfred · ‎10-26-2022

Hi @Michal Mlaka , it should be possible. If you are using APIs to launch the Job, remove "node_type_id" and "driver_node_type_id" from the JSON and pass the "instance_pool_id" and "driver_instance_pool_id" instead to make the cluster pick VMs from the pool.

Check this doc for the structure of cluster definition-

https://learn.microsoft.com/en-gb/azure/databricks/dev-tools/api/latest/clusters#--request-structure...

Let me know if this helps. Please mark the comment as "best answer" to resolve the query.