<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Issues with incremental data processing within Delta Live Tables in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/issues-with-incremental-data-processing-within-delta-live-tables/m-p/89429#M8298</link>
    <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;I have an problem with incrementally processing data within my Delta Live Tables (DLT) pipeline. I have a raw file (in Delta format) where new data is added each day. When I run my DLT pipeline I only want the new data to be processed. As an example I made a pipeline with two notebooks, the first contains the following code&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import dlt

@dlt.table()
def VEHICLE_SALES_INVOICE_TRX():

  df = spark.sql(f"""
                                
    WITH AllCompanies as (
      SELECT
        TO_DATE( CAST( VI.Invoice_Date_KEY AS  string ),'yyyyMMdd' )                                AS Invoice_Date_KEY,
        VI.Invoice_Number                                                                           AS Invoice_Number,
        VI.Invoice_sequence                                                                         AS Invoice_sequence,	
        VI.Deliver_To_Customer_KEY                                                                  AS Deliver_To_Customer_KEY,
        VI.Customer_KEY                                                                             AS Customer_KEY,
        VI.SalesPerson_User_KEY                                                                     AS SalesPerson_User_KEY,
        VI.Stock_Vehicle_KEY                                                                        AS Stock_Vehicle_KEY
        VI.SalesAmount
      FROM
       MAIN.RAW_DATA.EXTR_VEHICLE_INVOICE_part VI
      WHERE 
       VI.Invoice_Sequence = 1
    )

    /*
    more transformation...
    */

    SELECT
      Invoice_Date_KEY,
      Invoice_Number,
      Invoice_sequence,	
      Deliver_To_Customer_KEY,
      Customer_KEY,
      SalesPerson_User_KEY,
      Stock_Vehicle_KEY,
      SalesAmount
    FROM
      base b

  """)
  return df&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then I have a second notebook, in which I do a SELECT * from the table above, as a dlt.table().&lt;/P&gt;&lt;P&gt;I ran the pipeline as a full refresh, then appended some new rows to the&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;MAIN&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;RAW_DATA&lt;/SPAN&gt;&lt;SPAN&gt;.EXTR_VEHICLE_INVOICE_part table, and ran an update on the pipeline. But then all the rows were still processed, not just the new rows.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I also tried to define the raw table as a streaming table, but then I ran into an error when I appended rows the Delta table and tried to run an update, the message was&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;An error occured because we detected an update or delete to one or more rows in the source table. Streaming tables&lt;BR /&gt;may only use append-only streaming sources. If you expect to delete or update rows to the source table in the future, please convert your table&lt;BR /&gt;to a materialized view instead.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I tested the pipeline both in Pro and Advanced product edition, and the Databricks runtime is: 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Does anyone have any insight in what I do wrong, since it is my understanding that Delta Live Tables should be able to handle this sort of incremental processing,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Kind regards,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Andreas&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 11 Sep 2024 08:58:45 GMT</pubDate>
    <dc:creator>AndreasB2</dc:creator>
    <dc:date>2024-09-11T08:58:45Z</dc:date>
    <item>
      <title>Issues with incremental data processing within Delta Live Tables</title>
      <link>https://community.databricks.com/t5/get-started-discussions/issues-with-incremental-data-processing-within-delta-live-tables/m-p/89429#M8298</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;I have an problem with incrementally processing data within my Delta Live Tables (DLT) pipeline. I have a raw file (in Delta format) where new data is added each day. When I run my DLT pipeline I only want the new data to be processed. As an example I made a pipeline with two notebooks, the first contains the following code&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import dlt

@dlt.table()
def VEHICLE_SALES_INVOICE_TRX():

  df = spark.sql(f"""
                                
    WITH AllCompanies as (
      SELECT
        TO_DATE( CAST( VI.Invoice_Date_KEY AS  string ),'yyyyMMdd' )                                AS Invoice_Date_KEY,
        VI.Invoice_Number                                                                           AS Invoice_Number,
        VI.Invoice_sequence                                                                         AS Invoice_sequence,	
        VI.Deliver_To_Customer_KEY                                                                  AS Deliver_To_Customer_KEY,
        VI.Customer_KEY                                                                             AS Customer_KEY,
        VI.SalesPerson_User_KEY                                                                     AS SalesPerson_User_KEY,
        VI.Stock_Vehicle_KEY                                                                        AS Stock_Vehicle_KEY
        VI.SalesAmount
      FROM
       MAIN.RAW_DATA.EXTR_VEHICLE_INVOICE_part VI
      WHERE 
       VI.Invoice_Sequence = 1
    )

    /*
    more transformation...
    */

    SELECT
      Invoice_Date_KEY,
      Invoice_Number,
      Invoice_sequence,	
      Deliver_To_Customer_KEY,
      Customer_KEY,
      SalesPerson_User_KEY,
      Stock_Vehicle_KEY,
      SalesAmount
    FROM
      base b

  """)
  return df&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then I have a second notebook, in which I do a SELECT * from the table above, as a dlt.table().&lt;/P&gt;&lt;P&gt;I ran the pipeline as a full refresh, then appended some new rows to the&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;MAIN&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;RAW_DATA&lt;/SPAN&gt;&lt;SPAN&gt;.EXTR_VEHICLE_INVOICE_part table, and ran an update on the pipeline. But then all the rows were still processed, not just the new rows.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I also tried to define the raw table as a streaming table, but then I ran into an error when I appended rows the Delta table and tried to run an update, the message was&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;An error occured because we detected an update or delete to one or more rows in the source table. Streaming tables&lt;BR /&gt;may only use append-only streaming sources. If you expect to delete or update rows to the source table in the future, please convert your table&lt;BR /&gt;to a materialized view instead.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I tested the pipeline both in Pro and Advanced product edition, and the Databricks runtime is: 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Does anyone have any insight in what I do wrong, since it is my understanding that Delta Live Tables should be able to handle this sort of incremental processing,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Kind regards,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Andreas&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 11 Sep 2024 08:58:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/issues-with-incremental-data-processing-within-delta-live-tables/m-p/89429#M8298</guid>
      <dc:creator>AndreasB2</dc:creator>
      <dc:date>2024-09-11T08:58:45Z</dc:date>
    </item>
  </channel>
</rss>

