Databricks Community

dmadh · ‎02-13-2025

Question:

To reduce cluster- start up times, trying out the serveless compute option while triggering workflows, for proof of concept. I've noticed that a simple pyspark DataFrame creation task completes in 40-50 seconds. However, when multiple requests are queued for the same task on the serverless compute, the execution time for the 2nd and 3rd requests increases to 1.5 to 3 minutes.

According to the query history tab, each task only takes 3-5 seconds to complete, indicating significant time spent on scheduling and resource allocation. How can I reduce this overhead to achieve a total processing time of under 10 seconds per request?

Please note that, do not want concurrent runs for this use case. Pretty much depend on the queue for FIFO execution linearly.

Alberto_Umana · ‎02-13-2025

Hello @dmadh,

At the moment there isn't a direct way to improve this. Our engineering team is working on "speed optimized" feature and "warm pool" but isn't available yet.