cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Query related to Job cluster verses All Purpose Cluster

Dileep_Karanki
New Contributor

Currently if run the Notebook with Job Cluster it will show status as Pending for 2 minutes but All Purpose Cluster shows only 6seconds as pending and completes the run quickly. How to improve startup time of Job Cluster  to mmatch with startup time of All purpose cluster.

Pros and Cons of Job Cluster and All Purpose Cluster.

In which scenarios we should use Job Cluster and All Purpose Cluster?

If ADF pipelines with Databricks Notebooks are scheduled for every 5 minutes which Cluster should we use?

 

1 REPLY 1

mmayorga
Databricks Employee
Databricks Employee

Hi @Dileep_Karanki 

Thank you for reaching out and for your question.

  • Here is a link for documentation about the available cluster types:  https://docs.databricks.com/aws/en/compute#types-of-compute 
  • All-Purpose or Interactive Clusters: These are flexible, on-demand compute resources ideal for interactive data analysis, exploration, and development work. They can be shared among multiple users and manually started or stopped.
  • Job or Automated Clusters: These are created automatically by the Databricks job scheduler to run scheduled jobs and workflows. They are terminated upon job completion and optimized for specific workloads like ETL pipelines or batch processing. These are recommended for use in Production.
  • Automated "Job" clusters are ~50% cheaper than Interactive clusters (check the list prices—AWS / Azure / GCP).
  • As you may configure each task's compute, here is a list of the suggested type of compute for each task type: https://docs.databricks.com/aws/en/jobs/compute#what-is-the-recommended-compute-for-each-task 
  • For the fastest startup and performance, you may want to try serverless: https://docs.databricks.com/aws/en/jobs/run-serverless-jobs 

Another link for reference: https://docs.databricks.com/aws/en/jobs/compute#should-all-purpose-compute-ever-be-used-for-jobs 

Thank you

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group