cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

i have created a materialized view table using delta live table pipeline and its not appending data

zero234
New Contributor III
i have created a materialized view table using delta live table pipeline , for some reason it is overwriting data every day , i want it to append data to the table instead of doing full refresh suppose i had 8 million records in table and if i
run the pipeline it will remove those previous records and only put in new records. i want it to appends to already existing data i have tried using @Dlt.table(merge Mode="append")it throws unexpected keyword argument error
i have tried using @Dlt.table(merge Mode="append")it throws unexpected keyword argument error
what to do so my pipeline appends data 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @zero234., To ensure that your Delta Live Table pipeline appends data instead of overwriting it, you can use the @append_flow decorator.

Here are the steps:

  1. Use @append_flow:

  2. Define Materialized Views or Streaming Tables:

  3. Example Usage:

    from deltatables import Dlt
    
    @Dlt.table
    def my_materialized_view():
        # Your query here (e.g., SELECT * FROM my_source_data)
        pass
    
    @Dlt.append_flow
    def my_streaming_pipeline():
        # Your streaming logic here
        pass
    
  4. Override Default Behavior:

 

View solution in original post

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @zero234., To ensure that your Delta Live Table pipeline appends data instead of overwriting it, you can use the @append_flow decorator.

Here are the steps:

  1. Use @append_flow:

  2. Define Materialized Views or Streaming Tables:

  3. Example Usage:

    from deltatables import Dlt
    
    @Dlt.table
    def my_materialized_view():
        # Your query here (e.g., SELECT * FROM my_source_data)
        pass
    
    @Dlt.append_flow
    def my_streaming_pipeline():
        # Your streaming logic here
        pass
    
  4. Override Default Behavior:

 

Kasen
New Contributor III

Hi @Kaniz_Fatma,

In my DLT pipeline, I'm using DLT Classic Core as the resource. When I run the DLT pipeline (create silver layer from bronze) for the first time, it will create a Materialized view in silver layer. When there are some rows in the bronze layer been updated, I re-run the DLT pipeline again, I realized that the data in silver layer did reflect the latest changes from bronze layer. However, what I'm not clear is that the Materialized view in silver layer is doing a full refresh or just updating the rows that have changes? I couldn't find any source regarding this topic especially I'm using DLT Classic Core in DLT pipeline without CDC, appreciate your clarification, thank you!

kulkpd
Contributor

@zero234 ,

Adding some suggestion based on answers from @Kaniz_Fatma. Important point to note here: "To define a materialized view in Python, apply @table to a query that performs a static read against a data source. To define a streaming table, apply @table to a query that performs a streaming read against a data source."

I think if you which to read by streaming mode, DLT will treat your destination as streaming table.

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group