Hi @fintech_latency, This is a common architectura...

SteveOstrowski · ‎03-08-2026

This is a common architectural challenge when you need sub-second or near-real-time event processing. Let me walk through your three questions and then outline the recommended patterns.

QUESTION 1: CAN YOU GUARANTEE FIXED WARM SERVERLESS CAPACITY?

Serverless compute for Jobs does not currently offer a configuration to reserve or guarantee a fixed number of warm instances. The serverless pool is managed entirely by Databricks, and while the platform optimizes for fast startup, there is no user-facing knob to pin a specific number of workers in a "ready" state.

If you need guaranteed warm instances with classic compute, instance pools are the mechanism for this. You can set a Min Idle value greater than zero, which keeps pre-provisioned cloud instances ready so that clusters backed by that pool start in seconds rather than minutes. This does not apply to serverless, but it is the closest equivalent for classic compute.

Docs: https://docs.databricks.com/aws/en/compute/pool-best-practices

QUESTION 2: IS MAINTAINING LONG-RUNNING SERVERLESS JOB INSTANCES SUPPORTED?

Long-running serverless Jobs runs are technically possible, but polling a Delta table for task assignment (as your current design does) adds significant latency due to the read-process-write cycle and is not an efficient use of serverless resources.

For always-on workloads, the Continuous trigger type on Jobs is the supported mechanism. When you set a job to Continuous trigger, Databricks automatically restarts the job immediately after each run completes (typically within 60 seconds). However, note that continuous trigger mode does not currently support serverless compute. You would need to use classic compute (ideally backed by an instance pool for fast restarts) for continuous jobs.

Docs: https://docs.databricks.com/aws/en/jobs/continuous

QUESTION 3: BEST PRACTICES FOR SUB-SECOND / NEAR-REAL-TIME EXECUTION

For your use case of event-driven, low-latency processing, here are the recommended patterns in order of increasing latency:

1. Model Serving Endpoints (lowest latency, sub-100ms)
If your per-event processing logic can be expressed as a function or model, Mosaic AI Model Serving endpoints are purpose-built for this. They run on serverless infrastructure, auto-scale based on demand, and support latency under 50ms overhead. You can deploy custom Python functions (not just ML models) as serving endpoints. This is often the best fit for event-driven, request/response workloads.

Docs: https://docs.databricks.com/aws/en/machine-learning/model-serving

2. Spark Structured Streaming with Short Trigger Intervals
Rather than triggering a new job per event, use a single long-running Structured Streaming job with a short processingTime trigger (e.g., 1 second or even 500ms). This approach keeps the cluster warm continuously and processes micro-batches of events as they arrive. Pair this with:

- Classic compute backed by an instance pool (Min Idle > 0) for guaranteed warm resources
- Continuous trigger on the job so the stream auto-restarts on failure

This pattern gives you near-real-time processing (low single-digit seconds) without the cold start problem entirely, because the cluster never shuts down.

3. Lakeflow Spark Declarative Pipelines (SDP) in Continuous Mode
If your workload fits the declarative pipeline model, SDP supports continuous execution with automatic scaling. For serverless SDP pipelines, the "Performance optimized" setting reduces startup time for triggered pipelines. For always-on streaming, continuous SDP pipelines keep processing running without restarts.

Docs: https://docs.databricks.com/aws/en/delta-live-tables/serverless-dlt.html

4. Instance Pools with Classic Compute Jobs
If you must use the Jobs + per-event model, switch from serverless to classic compute backed by an instance pool. Configure:

- Min Idle instances: 3 (matches your desired warm worker count)
- On-demand instances (not spot) for predictable availability
- Idle Instance Auto Termination: set high enough to bridge gaps between events

This gives you cluster startup times of seconds rather than minutes.

RECOMMENDED ARCHITECTURE

For a fintech-style low-latency event processing system, the typical pattern is:

Event Source --> Kafka/Kinesis/Event Hubs
   |
   v
Structured Streaming Job (continuous, classic compute + instance pool)
   |
   v
Delta Tables (results / state)

This eliminates the cold start problem entirely because the streaming job runs continuously on warm compute. Events flow through the stream and are processed within the trigger interval you configure.

If you need per-event request/response (synchronous) rather than stream processing, Model Serving endpoints are the right tool. You can wrap your processing logic in a custom model and serve it at sub-100ms latency.

I would recommend moving away from the polling-based approach with the executor_pool_log Delta table, as Delta table polling inherently adds seconds of latency per cycle. The streaming or serving patterns above are designed specifically for the latency requirements you described.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.