Databricks Community

ChristianRRL · ‎07-09-2024

Hi there, this is a follow-up from a discussion I started last month

Solved: Re: DLT Compute: "Ephemeral" Job Compute vs. All-p... - Databricks Community - 71661

Based on what was discussed, I understand that it's not possible to use "All Purpose Clusters" with DLT Pipelines. I would like to understand WHY this is the case? I'm not sure I follow why Databricks wouldn't allow this as a possible implementation since the "Ephemeral" Job Compute clusters effectively always cost more since they require spinning up new resources when we already have All Purpose Clusters up & running.

Is there something I'm missing here?

raphaelblg · ‎07-23-2024

@ChristianRRL regarding on why DLT doesn't allow you to use all-purpose clusters:

1. The DLT runtime is derived from the shared compute DBR, it's not the same runtime and has different features than the common all-purpose runtime. A DLT pipeline is not capable of being executed in any of the all-purpose cluster runtimes.

2. DLT is a different product than all-purpose compute, with different prices.

Feel free to use our Pricing Calculator to compare prices. At the current moment, if you run the exact same workload, with the same driver and workers instance types (and workers amount) on DLT, it should bill you with less DBUs than on all-purpose.

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

View solution in original post

ChristianRRL · ‎07-11-2024

Good morning @Retired_mod, I think most of these points make sense, particularly running pipelines in a "fully isolated environment". I can understand that this can be a best practice (or in this case only practice) allowed by Databricks, but I'm still somewhat confused as to why there isn't at least an option to leverage the all-purpose clusters with DLT jobs (even if just as a non-default option). Out of curiosity, do you know if there's been any sort of discussion in Databricks to making this possible in the future?

Additionally, with respect to point (5) with data analytics (all-purpose) clusters and the job workloads being subject to "different pricing" than the data engineering (task) workloads, how might I best compare/contrast pricing between these two? For example, at the moment DLT is effectively *only* adding costs since our existing setup assumes that the all-purpose clusters are in a sense "set in stone" and any additional compute such as the task job clusters cost more since they are not using our existing all-purpose clusters. Maybe if we had a better idea as to what kind of cost savings we may get with DLT job clusters compared with all-purpose clusters, we may be able to shift some compute load out of all-purpose and more concretely save on costs rather than just adding to it.

ChristianRRL · ‎07-23-2024

@Retired_mod / @raphaelblg quick follow-up on this one. Wondering if anyone can provide a bit more feedback on the last points I wrote.

raphaelblg · ‎07-23-2024

@ChristianRRL regarding on why DLT doesn't allow you to use all-purpose clusters:

1. The DLT runtime is derived from the shared compute DBR, it's not the same runtime and has different features than the common all-purpose runtime. A DLT pipeline is not capable of being executed in any of the all-purpose cluster runtimes.

2. DLT is a different product than all-purpose compute, with different prices.

Feel free to use our Pricing Calculator to compare prices. At the current moment, if you run the exact same workload, with the same driver and workers instance types (and workers amount) on DLT, it should bill you with less DBUs than on all-purpose.

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks