cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Python SDK clusters.create_and_wait - Sourcing from cluster-create JSON

tseader
New Contributor

I am attempting to create a compute cluster using the Python SDK while sourcing a cluster-create configuration JSON file, which is how it's done for the databricks-cli and what databricks provides through the GUI.  Reading in the JSON as a Dict fails due to the assumption in the SDK that the arguments are of specific DataClass types, e.g.:

 

>       if autoscale is not None: body['autoscale'] = autoscale.as_dict()
E       AttributeError: 'dict' object has no attribute 'as_dict'

 

 This is the pattern of the call I'm making:

from databricks.sdk import WorkspaceClient

db_client = WorkspaceClient()
with open("my/path/to/cluster-create.json") as file:
    create_config = json.load(file)
db_client.clusters.create_and_wait(**create_config)

I've attempted to look around in the SDK to see if there's a bootstrapping function but haven't found anything.  I can certainly work around this situation, but it's a bit cumbersome so hoping the community can help point me to the magic-method I'm looking for.

Appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions

tseader
New Contributor

@Kaniz_Fatma The structure of the `cluster-create.json` is perfectly fine.  The issue is as stated above related to the structure is that the SDK does not allow nested structures from the JSON file to be used, and instead they need to be cast to specific Python dataclasses.


Here's what I came up with to get around the situation:

 

def create_compute_cluster(db_client: WorkspaceClient, cluster_conf: dict) -> str:
    cc = CreateCluster.from_dict(cluster_conf)
    refactored_input = dict()
    for field in list(cc.__dataclass_fields__.keys()):
        refactored_input[field] = cc.__getattribute__(field)
    return db_client.clusters.create_and_wait(**refactored_input, timeout=CLUSTER_UP_TIMEOUT)

 

I could also see the function reading the json file more like this:

 

def create_compute_cluster(db_client: WorkspaceClient, create_config_path: dict) -> str:
    with open(create_config_path) as file:
    create_config = json.load(file)
    cc = CreateCluster.from_dict(create_config)
    refactored_input = dict()
    for field in list(cc.__dataclass_fields__.keys()):
        refactored_input[field] = cc.__getattribute__(field)
    return db_client.clusters.create_and_wait(**refactored_input, timeout=CLUSTER_UP_TIMEOUT)

 

What may make sense is some additional functions in ClustersAPI class unless overloading is preferred using multipledispatch. All this assumes there's a need outside my own to do this type of pattern. ๐Ÿคท

View solution in original post

3 REPLIES 3

tseader
New Contributor

Kaniz_Fatma
Community Manager
Community Manager

Hi @tseader, If you encounter issues with the as_dict() method, consider checking the structure of your cluster-create.json file. Make sure it aligns with the expected parameters for creating a cluster.

If you need further assistance, feel free to ask! ๐Ÿ˜Š

 

tseader
New Contributor

@Kaniz_Fatma The structure of the `cluster-create.json` is perfectly fine.  The issue is as stated above related to the structure is that the SDK does not allow nested structures from the JSON file to be used, and instead they need to be cast to specific Python dataclasses.


Here's what I came up with to get around the situation:

 

def create_compute_cluster(db_client: WorkspaceClient, cluster_conf: dict) -> str:
    cc = CreateCluster.from_dict(cluster_conf)
    refactored_input = dict()
    for field in list(cc.__dataclass_fields__.keys()):
        refactored_input[field] = cc.__getattribute__(field)
    return db_client.clusters.create_and_wait(**refactored_input, timeout=CLUSTER_UP_TIMEOUT)

 

I could also see the function reading the json file more like this:

 

def create_compute_cluster(db_client: WorkspaceClient, create_config_path: dict) -> str:
    with open(create_config_path) as file:
    create_config = json.load(file)
    cc = CreateCluster.from_dict(create_config)
    refactored_input = dict()
    for field in list(cc.__dataclass_fields__.keys()):
        refactored_input[field] = cc.__getattribute__(field)
    return db_client.clusters.create_and_wait(**refactored_input, timeout=CLUSTER_UP_TIMEOUT)

 

What may make sense is some additional functions in ClustersAPI class unless overloading is preferred using multipledispatch. All this assumes there's a need outside my own to do this type of pattern. ๐Ÿคท

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!