โ07-09-2024 09:22 AM
Hi there, this is a follow-up from a discussion I started last month
Solved: Re: DLT Compute: "Ephemeral" Job Compute vs. All-p... - Databricks Community - 71661
Based on what was discussed, I understand that it's not possible to use "All Purpose Clusters" with DLT Pipelines. I would like to understand WHY this is the case? I'm not sure I follow why Databricks wouldn't allow this as a possible implementation since the "Ephemeral" Job Compute clusters effectively always cost more since they require spinning up new resources when we already have All Purpose Clusters up & running.
Is there something I'm missing here?
โ07-10-2024 01:58 AM
Hi @ChristianRRL, There are a few key reasons why DLT pipelines cannot use all-purpose clusters:
So in summary, the ephemeral nature of job clusters, isolation requirements, library management, and pricing differences make them a better fit for DLT pipelines than using a shared all-purpose cluster. The cost of spinning up new job clusters is offset by the benefits of a dedicated, isolated environment optimized for the pipeline workload.
โ07-11-2024 07:16 AM
Good morning @Kaniz_Fatma, I think most of these points make sense, particularly running pipelines in a "fully isolated environment". I can understand that this can be a best practice (or in this case only practice) allowed by Databricks, but I'm still somewhat confused as to why there isn't at least an option to leverage the all-purpose clusters with DLT jobs (even if just as a non-default option). Out of curiosity, do you know if there's been any sort of discussion in Databricks to making this possible in the future?
Additionally, with respect to point (5) with data analytics (all-purpose) clusters and the job workloads being subject to "different pricing" than the data engineering (task) workloads, how might I best compare/contrast pricing between these two? For example, at the moment DLT is effectively *only* adding costs since our existing setup assumes that the all-purpose clusters are in a sense "set in stone" and any additional compute such as the task job clusters cost more since they are not using our existing all-purpose clusters. Maybe if we had a better idea as to what kind of cost savings we may get with DLT job clusters compared with all-purpose clusters, we may be able to shift some compute load out of all-purpose and more concretely save on costs rather than just adding to it.
โ07-23-2024 06:59 AM
@Kaniz_Fatma / @raphaelblg quick follow-up on this one. Wondering if anyone can provide a bit more feedback on the last points I wrote.
โ07-23-2024 07:43 AM
@ChristianRRL regarding on why DLT doesn't allow you to use all-purpose clusters:
1. The DLT runtime is derived from the shared compute DBR, it's not the same runtime and has different features than the common all-purpose runtime. A DLT pipeline is not capable of being executed in any of the all-purpose cluster runtimes.
2. DLT is a different product than all-purpose compute, with different prices.
Feel free to use our Pricing Calculator to compare prices. At the current moment, if you run the exact same workload, with the same driver and workers instance types (and workers amount) on DLT, it should bill you with less DBUs than on all-purpose.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group