01-20-2026 06:57 AM
We’re building a low-latency processing pipeline on Databricks and are running into serverless cold-start constraints.
We ingest events (calls) continuously via a Spark Structured Streaming listener.
For each event, we trigger a serverlesss compute that must start immediately.
Jobs Serverless cold starts typically take 15–25 seconds, which is too slow for our use case.
To mitigate this, we attempted to keep 3 “idle” workers always running so that processing can begin immediately without paying the serverless startup penalty.
What we tried
A pool model backed by a Delta table (executor_pool_log) where:
assigning call_id hands work to a worker
We keep 3 workers alive at all times by starting Jobs runs that wait for assignment.
Workers poll the Delta table via Spark, which causes latency and even worse delays on serverless than we had originally.
What we’re trying to understand
Is there any supported way to guarantee a fixed amount of warm serverless compute (e.g. 3 ready workers) for jobs workloads?
Is keeping long-running serverless job runs alive intended or recommended?
Are there known best practices for sub-second/ near-real-time jobs execution on databricks today (serverless ideally)?
01-20-2026 07:52 AM
Is there any specific reason as to why it has to be serverless if it has to be always on anyway? Could you not provision a small cluster of dedicated compute?
01-20-2026 09:50 AM
Hi, As @KrisJohannesen is hinting at, we don't recommend Serverless for this type of workload for exactly the reasons you've mentioned. The recommended approach would be to have a dedicated cluster that is always on and therefore no start-up time is needed, although this can have a cost implication.
01-21-2026 01:33 AM
Hi Emma, thank you for your answer!
The main reason we are leaning toward serverless is cost efficiency during idle periods. Our workload is very spiky, for most of the day traffic is low, but during short (unpredictable) peak windows we can receive many events at the same time. For example, we might need 40 workers during a peak hour, but we don’t want to pay for 40 idle workers during the rest of the day
I understand this is not the recommended approach, but I'm curious if this may work: a readstream were the log stream is filtered for its own id:
query = ( spark.readStream .format("delta") .table("control_log") .filter(col("target_run_id") == run_id) .writeStream .foreachBatch(handle_tasks) .start()
Would this be an approach?
01-21-2026 01:34 AM
Hi Kris, thank you for your answer!
The main reason we are leaning toward serverless is cost efficiency during idle periods. Our workload is very spiky, for most of the day traffic is low, but during short (unpredictable) peak windows we can receive many events at the same time. For example, we might need 40 workers during a peak hour, but we don’t want to pay for 40 idle workers during the rest of the day
I understand this is not the recommended approach, but I'm curious if this may work: a readstream were the log stream is filtered for its own id:
query = ( spark.readStream .format("delta") .table("control_log") .filter(col("target_run_id") == run_id) .writeStream .foreachBatch(handle_tasks) .start()
Would this be an approach?
01-20-2026 10:05 AM - edited 01-20-2026 10:07 AM
You can either .
Use an always on cluster with appropriate max and min cluster settings.
Or
Use Serverless in performance optimized mode . You might be using standard mode which will take time to warm up .
https://docs.databricks.com/aws/en/ldp/serverless#select-a-performance-mode
01-21-2026 01:37 AM
Thank you Dbxdev,
Unfortunately even in performance optimized mode we get an upstart time that won't work for our use-case. I hear what you are saying regarding the always-on cluster, but due to the variability in our daily needs (spikes), this won't scale as well as we want. Curious if you know any other clever ways this may be optimized?
01-20-2026 02:24 PM
There is no such option of always-warm. Your latency-senstive use-case qualifies more for a dedicated cluster.
01-21-2026 01:40 AM
Hi Raman, thank you for sharing your knowledge.
I hear what you are saying regarding the dedicated cluster, but due to the variability in our daily needs (spikes), this won't scale as well as we want. That is why we are exploring this "creative" way of running idle executors. Curious if you know any other clever ways this may be optimized?
01-21-2026 07:51 AM
For streaming: refactor to one long‑running Structured Streaming job with a short trigger interval (for example, 1s) and move “assignment” logic into foreachBatch or a transactional task table updated within the micro‑batch.
For per‑event RPC: deploy a custom Model Serving endpoint with:
3 weeks ago
Hi @fintech_latency,
This is a common architectural challenge when you need sub-second or near-real-time event processing. Let me walk through your three questions and then outline the recommended patterns.
QUESTION 1: CAN YOU GUARANTEE FIXED WARM SERVERLESS CAPACITY?
Serverless compute for Jobs does not currently offer a configuration to reserve or guarantee a fixed number of warm instances. The serverless pool is managed entirely by Databricks, and while the platform optimizes for fast startup, there is no user-facing knob to pin a specific number of workers in a "ready" state.
If you need guaranteed warm instances with classic compute, instance pools are the mechanism for this. You can set a Min Idle value greater than zero, which keeps pre-provisioned cloud instances ready so that clusters backed by that pool start in seconds rather than minutes. This does not apply to serverless, but it is the closest equivalent for classic compute.
Docs: https://docs.databricks.com/aws/en/compute/pool-best-practices
QUESTION 2: IS MAINTAINING LONG-RUNNING SERVERLESS JOB INSTANCES SUPPORTED?
Long-running serverless Jobs runs are technically possible, but polling a Delta table for task assignment (as your current design does) adds significant latency due to the read-process-write cycle and is not an efficient use of serverless resources.
For always-on workloads, the Continuous trigger type on Jobs is the supported mechanism. When you set a job to Continuous trigger, Databricks automatically restarts the job immediately after each run completes (typically within 60 seconds). However, note that continuous trigger mode does not currently support serverless compute. You would need to use classic compute (ideally backed by an instance pool for fast restarts) for continuous jobs.
Docs: https://docs.databricks.com/aws/en/jobs/continuous
QUESTION 3: BEST PRACTICES FOR SUB-SECOND / NEAR-REAL-TIME EXECUTION
For your use case of event-driven, low-latency processing, here are the recommended patterns in order of increasing latency:
1. Model Serving Endpoints (lowest latency, sub-100ms)
If your per-event processing logic can be expressed as a function or model, Mosaic AI Model Serving endpoints are purpose-built for this. They run on serverless infrastructure, auto-scale based on demand, and support latency under 50ms overhead. You can deploy custom Python functions (not just ML models) as serving endpoints. This is often the best fit for event-driven, request/response workloads.
Docs: https://docs.databricks.com/aws/en/machine-learning/model-serving
2. Spark Structured Streaming with Short Trigger Intervals
Rather than triggering a new job per event, use a single long-running Structured Streaming job with a short processingTime trigger (e.g., 1 second or even 500ms). This approach keeps the cluster warm continuously and processes micro-batches of events as they arrive. Pair this with:
- Classic compute backed by an instance pool (Min Idle > 0) for guaranteed warm resources - Continuous trigger on the job so the stream auto-restarts on failure
This pattern gives you near-real-time processing (low single-digit seconds) without the cold start problem entirely, because the cluster never shuts down.
3. Lakeflow Spark Declarative Pipelines (SDP) in Continuous Mode
If your workload fits the declarative pipeline model, SDP supports continuous execution with automatic scaling. For serverless SDP pipelines, the "Performance optimized" setting reduces startup time for triggered pipelines. For always-on streaming, continuous SDP pipelines keep processing running without restarts.
Docs: https://docs.databricks.com/aws/en/delta-live-tables/serverless-dlt.html
4. Instance Pools with Classic Compute Jobs
If you must use the Jobs + per-event model, switch from serverless to classic compute backed by an instance pool. Configure:
- Min Idle instances: 3 (matches your desired warm worker count) - On-demand instances (not spot) for predictable availability - Idle Instance Auto Termination: set high enough to bridge gaps between events
This gives you cluster startup times of seconds rather than minutes.
RECOMMENDED ARCHITECTURE
For a fintech-style low-latency event processing system, the typical pattern is:
Event Source --> Kafka/Kinesis/Event Hubs | v Structured Streaming Job (continuous, classic compute + instance pool) | v Delta Tables (results / state)
This eliminates the cold start problem entirely because the streaming job runs continuously on warm compute. Events flow through the stream and are processed within the trigger interval you configure.
If you need per-event request/response (synchronous) rather than stream processing, Model Serving endpoints are the right tool. You can wrap your processing logic in a custom model and serve it at sub-100ms latency.
I would recommend moving away from the polling-based approach with the executor_pool_log Delta table, as Delta table polling inherently adds seconds of latency per cycle. The streaming or serving patterns above are designed specifically for the latency requirements you described.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.