Databricks Community

chsoni12 · ‎06-05-2025

I conducted a proof of concept (POC) to compare the performance of the DLT pipeline and Databricks Workflow using the same workload, task, code, and cluster configuration. Both configurations were set with autoscaling enabled, with a minimum of 1 worker node and a maximum of 5 worker nodes.

The differences were as follows:

DLT used enhanced autoscaling, while Databricks Workflow utilized standard autoscaling.
I created a Delta table using Databricks Workflow and a materialized view using DLT.

Results:

I ran both the pipeline and the workflow and observed that the DLT pipeline completed in 9.5 minutes, whereas the Databricks Workflow took 14.32 minutes. Additionally, the cost of running the DLT pipeline was lower.

Upon reviewing the logs, I found that in the Databricks Workflow, the cluster first upscaled from 1 worker to 19 workers, and then from 19 workers to 50 workers. In contrast, the DLT pipeline completed the entire process using only 5 worker nodes, starting from 1 and scaling up to 5.

Questions:

Why did the DLT pipeline complete with only 5 worker nodes, while the Databricks Workflow required up to 50 worker nodes?
How do autoscaling and enhanced autoscaling function in the background, and what accounts for the observed differences in scaling behavior?

Brahmareddy · ‎06-06-2025

Hi chsoni12,

How are you doing today?, As per my understanding, That's a great observation, and it's awesome that you're testing performance and cost between DLT and regular workflows. The key difference here lies in how autoscaling works. DLT pipelines use enhanced autoscaling, which is smarter in how it monitors the job and allocates resources—it tends to scale more conservatively and efficiently, based on actual workload needs and DAG structure. Standard autoscaling, used in regular workflows, can be more aggressive—it may ramp up quickly based on spikes or resource estimates, even if not all nodes are fully utilized. That’s likely why your Workflow shot up to 50 workers, while DLT comfortably completed the same task with just 5. The DLT engine is also optimized for Delta operations, and materialized views can reuse cached metadata and lineage, further speeding things up. So yes, enhanced autoscaling in DLT helps save both time and cost, especially for structured pipelines like yours.

Regards,

Brahma