a week ago
We’re building a low-latency processing pipeline on Databricks and are running into serverless cold-start constraints.
We ingest events (calls) continuously via a Spark Structured Streaming listener.
For each event, we trigger a serverlesss compute that must start immediately.
Jobs Serverless cold starts typically take 15–25 seconds, which is too slow for our use case.
To mitigate this, we attempted to keep 3 “idle” workers always running so that processing can begin immediately without paying the serverless startup penalty.
What we tried
A pool model backed by a Delta table (executor_pool_log) where:
assigning call_id hands work to a worker
We keep 3 workers alive at all times by starting Jobs runs that wait for assignment.
Workers poll the Delta table via Spark, which causes latency and even worse delays on serverless than we had originally.
What we’re trying to understand
Is there any supported way to guarantee a fixed amount of warm serverless compute (e.g. 3 ready workers) for jobs workloads?
Is keeping long-running serverless job runs alive intended or recommended?
Are there known best practices for sub-second/ near-real-time jobs execution on databricks today (serverless ideally)?
a week ago
Is there any specific reason as to why it has to be serverless if it has to be always on anyway? Could you not provision a small cluster of dedicated compute?
a week ago
Hi, As @KrisJohannesen is hinting at, we don't recommend Serverless for this type of workload for exactly the reasons you've mentioned. The recommended approach would be to have a dedicated cluster that is always on and therefore no start-up time is needed, although this can have a cost implication.
Wednesday
Hi Emma, thank you for your answer!
The main reason we are leaning toward serverless is cost efficiency during idle periods. Our workload is very spiky, for most of the day traffic is low, but during short (unpredictable) peak windows we can receive many events at the same time. For example, we might need 40 workers during a peak hour, but we don’t want to pay for 40 idle workers during the rest of the day
I understand this is not the recommended approach, but I'm curious if this may work: a readstream were the log stream is filtered for its own id:
query = ( spark.readStream .format("delta") .table("control_log") .filter(col("target_run_id") == run_id) .writeStream .foreachBatch(handle_tasks) .start()
Would this be an approach?
Wednesday
Hi Kris, thank you for your answer!
The main reason we are leaning toward serverless is cost efficiency during idle periods. Our workload is very spiky, for most of the day traffic is low, but during short (unpredictable) peak windows we can receive many events at the same time. For example, we might need 40 workers during a peak hour, but we don’t want to pay for 40 idle workers during the rest of the day
I understand this is not the recommended approach, but I'm curious if this may work: a readstream were the log stream is filtered for its own id:
query = ( spark.readStream .format("delta") .table("control_log") .filter(col("target_run_id") == run_id) .writeStream .foreachBatch(handle_tasks) .start()
Would this be an approach?
a week ago - last edited a week ago
You can either .
Use an always on cluster with appropriate max and min cluster settings.
Or
Use Serverless in performance optimized mode . You might be using standard mode which will take time to warm up .
https://docs.databricks.com/aws/en/ldp/serverless#select-a-performance-mode
Wednesday
Thank you Dbxdev,
Unfortunately even in performance optimized mode we get an upstart time that won't work for our use-case. I hear what you are saying regarding the always-on cluster, but due to the variability in our daily needs (spikes), this won't scale as well as we want. Curious if you know any other clever ways this may be optimized?
a week ago
There is no such option of always-warm. Your latency-senstive use-case qualifies more for a dedicated cluster.
Wednesday
Hi Raman, thank you for sharing your knowledge.
I hear what you are saying regarding the dedicated cluster, but due to the variability in our daily needs (spikes), this won't scale as well as we want. That is why we are exploring this "creative" way of running idle executors. Curious if you know any other clever ways this may be optimized?
Wednesday
For streaming: refactor to one long‑running Structured Streaming job with a short trigger interval (for example, 1s) and move “assignment” logic into foreachBatch or a transactional task table updated within the micro‑batch.
For per‑event RPC: deploy a custom Model Serving endpoint with: