cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error running 80 task at same time in Job, how limit this?

Maxi1693
New Contributor II

Hi! I have a Job running to process multiple streaming tables. 

In the beginning, it was working fine, but now I have 80 tables running in this job, the problem is that all the runs are trying to run at the same time throwing an error. Is there a way to limit the number of tasks that a job can execute per each run?

I have configured the Job cluster with autoscaling from 2 to 4. I thought it worked as a limitation to run per each of the 4 tasks, but I was wrong.

The error I am getting is "Failure starting repl. Try detaching and re-attaching the notebook.", as I could find it is because the cluster is overloaded, but I can not limit the number of runs in parallel. 

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Maxi1693, It appears that you’re encountering issues with parallel execution of tasks in your Databricks job.

Let’s address this by considering a few strategies:

  1. Concurrency Limit for Tasks:

  2. Job Creation Rate:

  3. Autoscaling and Cluster Configuration:

    • While autoscaling helps manage cluster resources dynamically, it doesn’t inherently limit the number of concurrent runs.
    • Instead of relying solely on autoscaling, consider setting a fixed cluster size (e.g., 4 workers) to ensure consistent resources for your tasks.
    • Additionally, configure the maximum number of concurrent tasks per worker node. You can set this in the cluster configuration under “Max Concurrency” or “Max Tasks”. Adjust this value based on your workload and available resources.
  4. Notebook Execution Order:

    • If your streaming tables are processed within notebooks, ensure that they execute sequentially rather than concurrently.
    • You can use notebook workflows or job dependencies to control the order of execution.
  5. Cluster Overload and REPL Errors:

    • The “Failure starting repl” error often occurs when the cluster is overloaded due to excessive concurrent tasks.
    • Detaching and re-attaching the notebook might temporarily alleviate the issue, but it’s not a long-term solution.
    • Focus on managing concurrency and resource allocation to prevent cluster overload.

Remember to monitor your job execution patterns, adjust cluster settings, and stagger your tasks to maintain a balance between parallelism and resource availability.

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!