cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to optimize Worklfow job startup time

Rahul14Gupta
New Contributor II

We want to create a workflow pipeline in which we trigger a Databricks workflow job from AWS. However, the startup time of Databricks workflow jobs on job compute is over 10 minutes, which is causing issues.

We would like to either avoid this startup time or reduce it to around 1 minute. Is there any way to achieve this? We have tried creating a pool cluster, but there doesn’t seem to be an option to select a pool cluster in the workflow job.

Do we have any suggestions or a better way to optimize this?

We have also tried serverless but due to its limitations it is not useful 

1 ACCEPTED SOLUTION

Accepted Solutions

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Rahul14Gupta,

One option is to use serverless job clusters, which can significantly reduce the startup time. Serverless clusters are designed to start quickly and can be a good fit for workloads that require fast initialization.

But you can actually use pools, by doing this:

Create a Pool:

  • Ensure you have the necessary permissions to create a pool. By default, only workspace admins have pool creation permissions.
  • Use the Databricks UI to create a pool:
    • Navigate to the Pools page and click the "Create Pool" button.
    • Specify the pool configuration, including instance types, minimum idle instances, maximum capacity, and idle instance auto termination time.
    • Click the "Create" button.

Attach a Cluster to a Pool:

  • When configuring a cluster, you can attach it to a pool by selecting the pool from the Driver Type or Worker Type dropdown in the cluster creation UI. Available pools will be listed at the top of each dropdown list.
  • If you are using the Clusters API, specify driver_instance_pool_id for the driver node and instance_pool_id for the worker nodes.

View solution in original post

2 REPLIES 2

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Rahul14Gupta,

One option is to use serverless job clusters, which can significantly reduce the startup time. Serverless clusters are designed to start quickly and can be a good fit for workloads that require fast initialization.

But you can actually use pools, by doing this:

Create a Pool:

  • Ensure you have the necessary permissions to create a pool. By default, only workspace admins have pool creation permissions.
  • Use the Databricks UI to create a pool:
    • Navigate to the Pools page and click the "Create Pool" button.
    • Specify the pool configuration, including instance types, minimum idle instances, maximum capacity, and idle instance auto termination time.
    • Click the "Create" button.

Attach a Cluster to a Pool:

  • When configuring a cluster, you can attach it to a pool by selecting the pool from the Driver Type or Worker Type dropdown in the cluster creation UI. Available pools will be listed at the top of each dropdown list.
  • If you are using the Clusters API, specify driver_instance_pool_id for the driver node and instance_pool_id for the worker nodes.

I am able to select instance pool now while creating dbx job. But Unable to figure out what is the cost. When I am selecting unrestricted policy then only I am able to select instance pool. When I am selecting job compute then instance pool option is not available. Can you confirm cost incurred by unrestricted policy is all purpose or  job compute cost?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group