Autoscaling with the autoloader without SDP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-11-2026 11:45 PM
Hi there,
I have a question regarding the autoloader without SDP and auto-scaling of clusters. I'm reading the following in the docs:
- Production considerations for Structured Streaming | Databricks on AWS:
Do not enable autoscaling for compute for Structured Streaming jobs. - Configure Auto Loader for production workloads | Databricks on AWS: Enhanced autoscaling implements optimization of streaming workloads and adds enhancements to improve the performance of batch workloads. Enhanced autoscaling optimizes costs by adding or removing machines as the workload changes.
- But also: Compute auto-scaling has limitations when scaling down cluster size for structured streaming workloads. Databricks recommends using Lakeflow Spark Declarative Pipelines with enhanced autoscaling for streaming workloads.
We don't use SDP because of serverless limitations. Is it not advised to use enhanced autoscaling for non-SDP jobs? And why is that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-11-2026 11:54 PM
And to add to the question. What if I have a job with 10 tasks that all use the autoloader. Would that benefit from auto-scaling?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-12-2026 04:30 AM
Hello !
In DBKS we have 2 different autoscaling mechanisms here:
- normal compute autoscaling on a job or all purpose cluster : this is the autoscaling you enable on a classic job cluster by setting min or max workers and for structured streaming jobs, it is not recommended to enable compute autoscaling because scale down has limitations for streaming workloads. The cluster may not scale down as expected and if you want to resize you will experience latency especially for stateful streams.
- autoscaling for LSDP : this is a pipeline specific autoscaling mode that uses pipeline workload metrics such as task slot usage and queued tasks. It improves streaming workload optimization and can proactively shut down under used nodes while avoiding failed tasks during shutdown.
So shortly :
- for non SDP continuous auto loader jobs you can use fixed size jobs compute
- for non SDP available now auto loader jobs autoscaling can be reasonable
- for streaming autoscaling with better scale down behavior LSDP autoscaling is the recommended option
Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-12-2026 05:07 AM
Hi, thank you for your answer. Could you elaborate a bit on this?
for non SDP available now auto loader jobs autoscaling can be reasonable
How do you decide on whether it is reasonable or not? Especially you said it is not recommended to enable compute autoscaling because scale down has limitations for streaming workloads