Tuesday
We’re building a low-latency processing pipeline on Databricks and are running into serverless cold-start constraints.
We ingest events (calls) continuously via a Spark Structured Streaming listener.
For each event, we trigger a serverlesss compute that must start immediately.
Jobs Serverless cold starts typically take 15–25 seconds, which is too slow for our use case.
To mitigate this, we attempted to keep 3 “idle” workers always running so that processing can begin immediately without paying the serverless startup penalty.
What we tried
A pool model backed by a Delta table (executor_pool_log) where:
assigning call_id hands work to a worker
We keep 3 workers alive at all times by starting Jobs runs that wait for assignment.
Workers poll the Delta table via Spark, which causes latency and even worse delays on serverless than we had originally.
What we’re trying to understand
Is there any supported way to guarantee a fixed amount of warm serverless compute (e.g. 3 ready workers) for jobs workloads?
Is keeping long-running serverless job runs alive intended or recommended?
Are there known best practices for sub-second/ near-real-time jobs execution on databricks today (serverless ideally)?
Tuesday
Is there any specific reason as to why it has to be serverless if it has to be always on anyway? Could you not provision a small cluster of dedicated compute?
Tuesday
Hi, As @KrisJohannesen is hinting at, we don't recommend Serverless for this type of workload for exactly the reasons you've mentioned. The recommended approach would be to have a dedicated cluster that is always on and therefore no start-up time is needed, although this can have a cost implication.
yesterday
Hi Emma, thank you for your answer!
The main reason we are leaning toward serverless is cost efficiency during idle periods. Our workload is very spiky, for most of the day traffic is low, but during short (unpredictable) peak windows we can receive many events at the same time. For example, we might need 40 workers during a peak hour, but we don’t want to pay for 40 idle workers during the rest of the day
I understand this is not the recommended approach, but I'm curious if this may work: a readstream were the log stream is filtered for its own id:
query = ( spark.readStream .format("delta") .table("control_log") .filter(col("target_run_id") == run_id) .writeStream .foreachBatch(handle_tasks) .start()
Would this be an approach?
yesterday
Hi Kris, thank you for your answer!
The main reason we are leaning toward serverless is cost efficiency during idle periods. Our workload is very spiky, for most of the day traffic is low, but during short (unpredictable) peak windows we can receive many events at the same time. For example, we might need 40 workers during a peak hour, but we don’t want to pay for 40 idle workers during the rest of the day
I understand this is not the recommended approach, but I'm curious if this may work: a readstream were the log stream is filtered for its own id:
query = ( spark.readStream .format("delta") .table("control_log") .filter(col("target_run_id") == run_id) .writeStream .foreachBatch(handle_tasks) .start()
Would this be an approach?
Tuesday - last edited Tuesday
You can either .
Use an always on cluster with appropriate max and min cluster settings.
Or
Use Serverless in performance optimized mode . You might be using standard mode which will take time to warm up .
https://docs.databricks.com/aws/en/ldp/serverless#select-a-performance-mode
yesterday
Thank you Dbxdev,
Unfortunately even in performance optimized mode we get an upstart time that won't work for our use-case. I hear what you are saying regarding the always-on cluster, but due to the variability in our daily needs (spikes), this won't scale as well as we want. Curious if you know any other clever ways this may be optimized?
Tuesday
There is no such option of always-warm. Your latency-senstive use-case qualifies more for a dedicated cluster.
yesterday
Hi Raman, thank you for sharing your knowledge.
I hear what you are saying regarding the dedicated cluster, but due to the variability in our daily needs (spikes), this won't scale as well as we want. That is why we are exploring this "creative" way of running idle executors. Curious if you know any other clever ways this may be optimized?
yesterday
For streaming: refactor to one long‑running Structured Streaming job with a short trigger interval (for example, 1s) and move “assignment” logic into foreachBatch or a transactional task table updated within the micro‑batch.
For per‑event RPC: deploy a custom Model Serving endpoint with: