In this scenario, the driver node is reclaimed by AWS. Databricks started preview of hybrid pools feature which would allow you to provision driver node from a different pool. We recommend using on-demand pool for driver node to improve reliability in frequent spot loss scenarios and worker nodes can be provisioned from your spot fleet pool. This functionality is supported only by API as of today.
Steps to configure cluster from hybrid pools.
1. Create/(use existing) on demand pool from which driver node can be provisioned.
2. When creating a cluster, you can provide on demand pool id as "driver_instance_pool_id" in the cluster creation request.
Example API request to create a hybrid pool cluster -
{
"num_workers": 1,
"cluster_name": "test-hybrid-create",
"spark_version": "7.2.x-scala2.12",
"spark_conf": {},
"aws_attributes": {
},
"ssh_public_keys": [],
"custom_tags": {},
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"autotermination_minutes": 120,
"init_scripts": [],
"instance_pool_id": "1109-172550-mimic2-pool-worker",
"driver_instance_pool_id": "1109-172516-retch1-pool-driver"
}
Assumptions & Limitations -
* Creating Hybrid pool clusters is only supported via API as of now.
* We recommend to test this functionality and see if it helps in your case before using in Production scenarios.