cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the maximum of concurrent streaming jobs for a cluster?

jwilliam
Contributor

What is the maximum of concurrent streaming jobs for a cluster? How can I have the right amount of concurrent streaming jobs for different cluster configuration?

Should I use multiple cluster for different jobs or combine it into a big cluster to handle all the jobs?

1 ACCEPTED SOLUTION

Accepted Solutions

Prabakar
Esteemed Contributor III
Esteemed Contributor III

I understand, but you can calculate the risk involved in using a single cluster for all your streaming jobs. Let's say you are running 4 streaming jobs in a cluster and because of 1 job the cluster gets into a hung state or something went wrong on the cluster, then all 4 jobs will be affected. However, if you use separate clusters for each streaming job, then in the event of such problems only one job will be affected and others will be running properly. This is my thought. You need to decide all factors and plan the clusters. Also you can calculate the pricing for one cluster and multiple clusters.

Let's say for 4 streaming jobs, I use a single cluster of i3.4xlarge instance with 10 workers of the same type, I use 44 DBU/hr

& if I use 1 cluster per job, so I can use 4 smaller clusters each i3.xlarge instance with 10 workers will also cost me 44 DBU/hour (11 DBU/hr per cluster).

This way you can calculate the workload and the pricing and decide on the cluster sizing.

View solution in original post

4 REPLIES 4

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @John William​ it would be better to use different clusters for each streaming jobs.

I worried about the cost of this approach, spin up new cluster for every streaming job running non stop required a lot of resources.

Prabakar
Esteemed Contributor III
Esteemed Contributor III

I understand, but you can calculate the risk involved in using a single cluster for all your streaming jobs. Let's say you are running 4 streaming jobs in a cluster and because of 1 job the cluster gets into a hung state or something went wrong on the cluster, then all 4 jobs will be affected. However, if you use separate clusters for each streaming job, then in the event of such problems only one job will be affected and others will be running properly. This is my thought. You need to decide all factors and plan the clusters. Also you can calculate the pricing for one cluster and multiple clusters.

Let's say for 4 streaming jobs, I use a single cluster of i3.4xlarge instance with 10 workers of the same type, I use 44 DBU/hr

& if I use 1 cluster per job, so I can use 4 smaller clusters each i3.xlarge instance with 10 workers will also cost me 44 DBU/hour (11 DBU/hr per cluster).

This way you can calculate the workload and the pricing and decide on the cluster sizing.

Kaniz
Community Manager
Community Manager

Hi @John William​ , We haven't heard from you on the last response from @Prabakar​ , and I was checking back to see if his suggestions helped you.

Or else, If you have any solution, please share it with the community as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.