cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Autoscaling with the autoloader without SDP

HTD360
New Contributor III

Hi there,

I have a question regarding the autoloader without SDP and auto-scaling of clusters. I'm reading the following in the docs:

We don't use SDP because of serverless limitations. Is it not advised to use enhanced autoscaling for non-SDP jobs? And why is that?

 

 

3 REPLIES 3

HTD360
New Contributor III

And to add to the question. What if I have a job with 10 tasks that all use the autoloader. Would that benefit from auto-scaling?

amirabedhiafi
New Contributor III

Hello !

In DBKS we have 2 different autoscaling mechanisms here:

- normal compute autoscaling on a job or all purpose cluster : this is the autoscaling you enable on a classic job cluster by setting min or max workers and for structured streaming jobs, it is not recommended to enable compute autoscaling because scale down has limitations for streaming workloads. The cluster may not scale down as expected and if you want to resize you will experience latency especially for stateful streams.

- autoscaling for LSDP : this is a pipeline specific autoscaling mode that uses pipeline workload metrics such as task slot usage and queued tasks. It improves streaming workload optimization and can proactively shut down under used nodes while avoiding failed tasks during shutdown.

So shortly :

  • for non SDP continuous auto loader jobs you can use fixed size jobs compute
  • for non SDP available now auto loader jobs autoscaling can be reasonable
  • for streaming autoscaling with better scale down behavior LSDP autoscaling is the recommended option
If this answer resolves your question, could you please mark it as “Accept as Solution”? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP

HTD360
New Contributor III

Hi, thank you for your answer. Could you elaborate a bit on this?
for non SDP available now auto loader jobs autoscaling can be reasonable

How do you decide on whether it is reasonable or not? Especially you said it is not recommended to enable compute autoscaling because scale down has limitations for streaming workloads