cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the maximum of concurrent streaming jobs for a cluster?

jwilliam
Contributor

What is the maximum of concurrent streaming jobs for a cluster? How can I have the right amount of concurrent streaming jobs for different cluster configuration?

Should I use multiple cluster for different jobs or combine it into a big cluster to handle all the jobs?

1 ACCEPTED SOLUTION

Accepted Solutions

Prabakar
Databricks Employee
Databricks Employee

I understand, but you can calculate the risk involved in using a single cluster for all your streaming jobs. Let's say you are running 4 streaming jobs in a cluster and because of 1 job the cluster gets into a hung state or something went wrong on the cluster, then all 4 jobs will be affected. However, if you use separate clusters for each streaming job, then in the event of such problems only one job will be affected and others will be running properly. This is my thought. You need to decide all factors and plan the clusters. Also you can calculate the pricing for one cluster and multiple clusters.

Let's say for 4 streaming jobs, I use a single cluster of i3.4xlarge instance with 10 workers of the same type, I use 44 DBU/hr

& if I use 1 cluster per job, so I can use 4 smaller clusters each i3.xlarge instance with 10 workers will also cost me 44 DBU/hour (11 DBU/hr per cluster).

This way you can calculate the workload and the pricing and decide on the cluster sizing.

View solution in original post

3 REPLIES 3

Prabakar
Databricks Employee
Databricks Employee

Hi @John William​ it would be better to use different clusters for each streaming jobs.

I worried about the cost of this approach, spin up new cluster for every streaming job running non stop required a lot of resources.

Prabakar
Databricks Employee
Databricks Employee

I understand, but you can calculate the risk involved in using a single cluster for all your streaming jobs. Let's say you are running 4 streaming jobs in a cluster and because of 1 job the cluster gets into a hung state or something went wrong on the cluster, then all 4 jobs will be affected. However, if you use separate clusters for each streaming job, then in the event of such problems only one job will be affected and others will be running properly. This is my thought. You need to decide all factors and plan the clusters. Also you can calculate the pricing for one cluster and multiple clusters.

Let's say for 4 streaming jobs, I use a single cluster of i3.4xlarge instance with 10 workers of the same type, I use 44 DBU/hr

& if I use 1 cluster per job, so I can use 4 smaller clusters each i3.xlarge instance with 10 workers will also cost me 44 DBU/hour (11 DBU/hr per cluster).

This way you can calculate the workload and the pricing and decide on the cluster sizing.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group