cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Legacy Autoscaling(workflow) VS Enhanced Autoscaling(DLT)

chsoni12
New Contributor II

I conducted a proof of concept (POC) to compare the performance of the DLT pipeline and Databricks Workflow using the same workload, task, code, and cluster configuration. Both configurations were set with autoscaling enabled, with a minimum of 1 worker node and a maximum of 5 worker nodes.

The differences were as follows:

  1. DLT used enhanced autoscaling, while Databricks Workflow utilized standard autoscaling.

  2. I created a Delta table using Databricks Workflow and a materialized view using DLT.

Results:

I ran both the pipeline and the workflow and observed that the DLT pipeline completed in 9.5 minutes, whereas the Databricks Workflow took 14.32 minutes. Additionally, the cost of running the DLT pipeline was lower.

Upon reviewing the logs, I found that in the Databricks Workflow, the cluster first upscaled from 1 worker to 19 workers, and then from 19 workers to 50 workers. In contrast, the DLT pipeline completed the entire process using only 5 worker nodes, starting from 1 and scaling up to 5.

Questions:

  1. Why did the DLT pipeline complete with only 5 worker nodes, while the Databricks Workflow required up to 50 worker nodes?

  2. How do autoscaling and enhanced autoscaling function in the background, and what accounts for the observed differences in scaling behavior?

1 REPLY 1

Brahmareddy
Honored Contributor III

Hi chsoni12,

How are you doing today?, As per my understanding, That's a great observation, and it's awesome that you're testing performance and cost between DLT and regular workflows. The key difference here lies in how autoscaling works. DLT pipelines use enhanced autoscaling, which is smarter in how it monitors the job and allocates resources—it tends to scale more conservatively and efficiently, based on actual workload needs and DAG structure. Standard autoscaling, used in regular workflows, can be more aggressive—it may ramp up quickly based on spikes or resource estimates, even if not all nodes are fully utilized. That’s likely why your Workflow shot up to 50 workers, while DLT comfortably completed the same task with just 5. The DLT engine is also optimized for Delta operations, and materialized views can reuse cached metadata and lineage, further speeding things up. So yes, enhanced autoscaling in DLT helps save both time and cost, especially for structured pipelines like yours.

Regards,

Brahma

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now