cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to setup an all-purpose cluster pool for all my jobs?

Siebert_Looije
Contributor

Today, we start working on setting up an all-purpose cluster pool for all the jobs that we are running on databricks. We used the documentation for this but we got some issues when running our jobs.

The errors in the jobs are the following:

Error message 

The jobs are running in parallel. To give an explanation:

Jobs 

The pool has the following configuration:

pool configurationCluster has the following configuration:

cluster configurationThe log4j-active is in the attachment.

Furthermore, I saw that the autoscaling didn't scale up, when doing multiple jobs at the same time. If running multiple jobs on the same pool, it should autoscale right?

Thanks for your time!

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III
  • Autoscaling goes up only when required by the size of the dataset etc. Another job will create a new cluster using idle machines from the pool and, if not idle, deploying new ones.
  • So the pool is designed so that another job can reuse VMs. I see two strategies:

1) have min idle to set for some numbers, so machines are waiting to handle your job, and you reserve them to get a discount,

2) or just the opposite, have 0 idle and use spot instances,

  • regarding errors, please check that you don't hit quotas in your service provider (for example, in portal azure, type quotas in the search box)

Thanks for the explanation!

What do I define for the cluster then? Because we have quite some jobs which are in parallel, should I define multiple clusters and set them in the pool or is there a better way to add multiple clusters to the pool? As per job there is a new cluster used.

Because the current situation is that we start a job cluster per job now and we are not reusing the job cluster and we would like to find a way to reuse the job cluster (I was thinking this was with the pool feature)

What does the 'Failure starting repl. ' error mean? So I can look a bit more in the direction on which quotas could be hit?

Thanks for taking the time to answer the question!

Anonymous
Not applicable

Hi @Siebert Looije​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Hi @Vidula Khanna​ , thanks for reaching out. No I didn't really get a solution on this yet. I got some follow up questions, which were not really answered until now.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.