02-08-2022 06:51 AM
Continuing the above case, does that mean if i have several like 5 ADF pipelines scheduled regularly at the same time, its better to use an existing cluster as all of the ADF pipelines would share the same cluster and hence the cost will be lower?
03-15-2022 10:03 PM
for adf or job run we always prefer job cluster. but for streaming, you may consider using interactive cluster . but anyway you need to monitor the cluster load, if loads are high there will be chance to job slowness as well as failure. also data size will be a factor. @nafri A
02-09-2022 01:54 AM
ADF pipelines will execute the notebooks as follows:
if you use a dedicated cluster and you run 2 notebooks simultaneously on the dedicated cluster, it will acutally run both.
Mind that dedicated clusters are more expensive than job clusters.
The same can be achieved using notebook workflow where you call parallel notebooks within one single notebook (which is scheduled in ADF). So like that you do not have to use a dedicated cluster and instead use a job cluster.
The main downside of this is that your cluster may get hammered because of the parallel runs. Not necessarily, but that is definitely a concern.
So you could also opt for a cluster pool, whic you can use in ADF. It is not exactly the same as using a single cluster but workers that are not needed can be used for other jobs, until they timeout after x minutes of inactivity.
So depending on your scenario you can go one way or another.
Me, I never use dedicated clusters because of the price. So I use separate job clusters, notebook workflows and pools.
02-28-2022 01:29 AM
Since all the pipelines are orchestrated via ADF, so we are using mostly dedicated clusters but the sizes are small. So the idea to run multiple notebooks via single notebook is not an optimal solution in this case.
So i think if you have several pipelines and each one use a job cluster it would end up with higher cost as i think the job cluster is not shared between jobs as they are created on the go.
02-28-2022 01:33 AM
With workbook workflows you can use a job cluster for several notebook simultaneously.
Only pay attention to the cluster load.
This is the cheapest option.
Cluster pools are also an option as you can use spot instances and you can save money for startup times of the nodes.
03-15-2022 10:03 PM
for adf or job run we always prefer job cluster. but for streaming, you may consider using interactive cluster . but anyway you need to monitor the cluster load, if loads are high there will be chance to job slowness as well as failure. also data size will be a factor. @nafri A
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group