cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Optimizing Task Execution Time on Databricks Serverless Compute

dmadh
New Contributor

Question:

To reduce cluster- start up times, trying out the serveless compute option while triggering workflows, for proof of concept. I've noticed that a simple pyspark DataFrame creation task completes in 40-50 seconds. However, when multiple requests are queued for the same task on the serverless compute, the execution time for the 2nd and 3rd requests increases to 1.5 to 3 minutes.

According to the query history tab, each task only takes 3-5 seconds to complete, indicating significant time spent on scheduling and resource allocation. How can I reduce this overhead to achieve a total processing time of under 10 seconds per request?

Please note that, do not want concurrent runs for this use case. Pretty much depend on the queue for FIFO execution linearly.

1 REPLY 1

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @dmadh,

At the moment there isn't a direct way to improve this. Our engineering team is working on "speed optimized" feature and "warm pool" but isn't available yet. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group