We’re building a low-latency processing pipeline on Databricks and are running into serverless cold-start constraints.
We ingest events (calls) continuously via a Spark Structured Streaming listener.
For each event, we trigger a serverlesss compute that must start immediately.
Jobs Serverless cold starts typically take 15–25 seconds, which is too slow for our use case.
To mitigate this, we attempted to keep 3 “idle” workers always running so that processing can begin immediately without paying the serverless startup penalty.
What we tried
A pool model backed by a Delta table (executor_pool_log) where:
We keep 3 workers alive at all times by starting Jobs runs that wait for assignment.
Workers poll the Delta table via Spark, which causes latency and even worse delays on serverless than we had originally.
What we’re trying to understand
Is there any supported way to guarantee a fixed amount of warm serverless compute (e.g. 3 ready workers) for jobs workloads?
Is keeping long-running serverless job runs alive intended or recommended?
Are there known best practices for sub-second/ near-real-time jobs execution on databricks today (serverless ideally)?