cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Need help with DLT

Fatimah-Tariq
New Contributor II

I have a DLT pipeline on databricks that has been running since months and just now I found out that there has been an issue with a logic in silver layer and as a result, the tables in my silver schema has faulty records now. Silver layer tables are STREAMING TABLES.
I want to fix this issue in DLT without creating new set of tables. So, what I need to do is to read from my streaming tables in silver, do my logic update, delete the faulty records from the same streaming table in silver and then write the correct records back to the same silver layer streaming schema.(In other words: Updating silver schema)
Is it possible to do it this way i.e. reading from, updating and writing to the same schema in the context of DLT?

Any quick help would be highly appreciated.

2 REPLIES 2

filipniziol
Contributor III

Hi @Fatimah-Tariq ,

What about defining your DLT pipeline as below.
This way you will create a stream table that reads from your silver, apply all the needed changes and then write back to your silver.

%sql
-- Read from the streaming table in the silver schema
CREATE OR REFRESH STREAMING TABLE silver_stream
AS SELECT *
FROM STREAM(silver_schema.streaming_table);

-- Apply your logic update
-- ... perform necessary transformations ...

-- Delete faulty records
DELETE FROM silver_stream
WHERE <condition>;

-- Write the correct records back to the silver schema
CREATE OR REFRESH STREAMING TABLE silver_schema.streaming_table
AS SELECT *
FROM STREAM(silver_stream);

 

That did not help. It keeps giving me "Unable to resolve the flow" error for all the tables in silver schema. 

I'm attempting to do it another way now. That's to create a simple notebook, shift all my deletion logic to that notebook(in the context of DLT) and then attach that notebook to my pipeline, like this 
Staging -> Bronze -> (New) Notebook with deletion logic ->  Silver -> Gold. 

This structure is running fine without any errors but it is not calling the function that I defined in my new notebook. Like it goes to that notebook but comes out without even calling that function written in the context of @Dlt.view. 
Can you help me understand this behavior? Why is this happening and if what I'm doing is right or not?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group