cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT table deletion

habyphilipose
New Contributor II

If we delete the DLT pipeline, the tables would get deleted.
But in a DLT pipeline which creates 5 tables , if I comment out logic of 1 table, that table is not deleted from the catalog, even though full refresh of the pipeline is done.

Does anyone know why that happens? 
Is there any way to remove that 1 table from the catalog other than the DROP command, i.e; from the DLT pipeline itself?

3 REPLIES 3

kerem
Contributor

Hi @habyphilipose 

I believe this is due to the additive nature of the pipeline, designed to only add and update new tables. I can imagine a catastrophic loss if a table would be deleted in production just because it's commented out. So I believe that's by design to preserve integrity by not tearing tables down based on code changes alone. 

I believe your only option is to drop the table manually. 

Kerem

Advika
Databricks Employee
Databricks Employee

Hello @habyphilipose!

I agree with @kerem. However, Databricks now supports an auto-drop behaviour using the pipeline configuration "pipelines.dropInactiveTables": "true". When enabled, this will automatically remove MV/STs from the catalog during the next pipeline update, only if those tables are no longer defined in the pipeline code.

MartinIsti
New Contributor III

Don't confuse DLT and LDP (Lakeflow Declarative Pipelines) as though behind the scenes they work very similarly, the UI and the developer experience has changed immensely and very important new features have been added. I used DLT extensively and in a very dynamic way where the tables to process were coming from an ever changing metadata file. Let's say I ingested all 68 tables from AdventureWorks. If any was removed from the metadata, it wasn't only skipped by DLT but removed entirely. That whole approach of the DLT pipeline having ownership of the created objects was a show stopper for us and we regularly asked Databricks to prioritise the change of this behaviour.

To their credit they admitted that there's a better way and though it took time to change it, the LDP version addresses that by separating flows from objects - see more details here: https://learn.microsoft.com/en-us/azure/databricks/dlt/concepts#key-concepts:

"A flow reads data from a source, applies user-defined processing logic, and writes the result into a target."

I only know this yet in theory as I haven't had the chance to give it another go since the announcement of it.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now