Cluster Reuse for delta live tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-21-2022 09:40 AM
I have several delta live table notebooks that are tied to different delta live table jobs so that I can use multiple target schema names. I know it's possible to reuse a cluster for job segments but is it possible for these delta live table jobs (which are run in sequence) to reuse the cluster that was created by the first job. I'm running into quota issues in the customer azure environment and this would help a lot.
It seems like if I run an individual job (lets call it sliver) multiple times, it will pickup the cluster that it used before, but if I run another job (lets call it gold), it tries to start its own cluster even though they are configured with the same cluster configuration
- Labels:
-
Delta
-
Delta Live Tables
-
Job Cluster
-
JOBS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-23-2022 02:19 PM
The same DLT job (workflow) will use the same cluster in development mode (shutdown in 2h) and new in production (shutdown 0). Although in JSON, you can manipulate that value:
{
"configuration": {
"pipelines.clusterShutdown.delay": "60s"
}
}
You can manipulate Azure's quotas by using different instances, and also, for smaller streams; you can set workers to 0
{
"clusters": [
{
"label": "default",
"node_type_id": "Standard_D3_v2",
"driver_node_type_id": "Standard_D3_v2",
"num_workers": 0
}
]
}
I hope that pools will be added to DLT and serverless options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-24-2022 11:16 AM
Thank you for you for sharing this @Hubert Dudekโ. I will highly recommend you @John Ficoโ to follow Hubert's recommendations. In case you would like to check our docs, please go here https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-configuration.html#cluster...

