Optimizing Task Execution Time on Databricks Serverless Compute

dmadh — Thu, 13 Feb 2025 10:55:12 GMT

Question:

To reduce cluster- start up times, trying out the serveless compute option while triggering workflows, for proof of concept. I've noticed that a simple pyspark DataFrame creation task completes in 40-50 seconds. However, when multiple requests are queued for the same task on the serverless compute, the execution time for the 2nd and 3rd requests increases to 1.5 to 3 minutes.

According to the query history tab, each task only takes 3-5 seconds to complete, indicating significant time spent on scheduling and resource allocation. How can I reduce this overhead to achieve a total processing time of under 10 seconds per request?

Please note that, do not want concurrent runs for this use case. Pretty much depend on the queue for FIFO execution linearly.

Re: Optimizing Task Execution Time on Databricks Serverless Compute

Alberto_Umana — Thu, 13 Feb 2025 12:47:38 GMT

Hello @dmadh,

At the moment there isn't a direct way to improve this. Our engineering team is working on "speed optimized" feature and "warm pool" but isn't available yet.

topic Optimizing Task Execution Time on Databricks Serverless Compute in Data Engineering

Optimizing Task Execution Time on Databricks Serverless Compute

Question:

Re: Optimizing Task Execution Time on Databricks Serverless Compute