<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta Live Table - not reading the changed record from cloud file in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-table-not-reading-the-changed-record-from-cloud-file/m-p/51000#M28929</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;neither DLT stream, neither native Spark Structure Streaming, will not pick up a fact that record has changed. It can only read new comings data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. If you want to maintain incremental loading of data, and you want to read data which are added, remove this option from you pipeline&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;.option("readchangeFeed","true")&lt;/PRE&gt;&lt;P&gt;and check if your pipeline works fine, by adding additional file to this location:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;path="/mnt/saphana-adls-landing/saphana-adls-landing/customer_landing"&lt;/PRE&gt;&lt;P&gt;2. If you don't care about incremental loading of data, but you care about data being changed, you can do the full reload, by changing&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")&lt;/PRE&gt;&lt;P&gt;to:&lt;/P&gt;&lt;P&gt;spark.read.csv()&lt;/P&gt;&lt;P&gt;3. There is also something called Change Data Feed, but its more advance, and I dont think that its what you are looking for. You are read more about it here:&amp;nbsp;&lt;A href="https://docs.databricks.com/en/delta/delta-change-data-feed.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/delta/delta-change-data-feed.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Good Luck&lt;/P&gt;</description>
    <pubDate>Sun, 12 Nov 2023 20:20:53 GMT</pubDate>
    <dc:creator>Emil_Kaminski</dc:creator>
    <dc:date>2023-11-12T20:20:53Z</dc:date>
    <item>
      <title>Delta Live Table - not reading the changed record from cloud file</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-not-reading-the-changed-record-from-cloud-file/m-p/50997#M28926</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am trying to ingest the data from cloudfile to bronze table. DLT is working fist time and loading the data into Bronze table. but when i add new record and change a filed in existing record the DLT pipeline goes success but it should be inserted 1 record and updated 1 record but it shows 0 record processed.&lt;/P&gt;&lt;P&gt;my code is below.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;schema = StructType(
   [
    StructField('customer_id', StringType(), True),
    StructField('customer_name', StringType(), True),
    StructField('customer_phone', StringType(), True),
    StructField('operation_date', StringType(), True)
   ]
  )
path="/mnt/saphana-adls-landing/saphana-adls-landing/customer_landing"
@dlt.table(comment="load bronz customer table from adls datalake landing zone",
                  path="/mnt/saphana-adls-landing/saphana-adls-landing/delta/bronze_customer")
def customer():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .option("header", "true")
      .option("readchangeFeed","true")
      .option("ignoreChanges", "true")
      .schema(schema) 
      .load(path)
      #df_landing_customer
      )&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 12 Nov 2023 18:49:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-not-reading-the-changed-record-from-cloud-file/m-p/50997#M28926</guid>
      <dc:creator>alj_a</dc:creator>
      <dc:date>2023-11-12T18:49:19Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Table - not reading the changed record from cloud file</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-not-reading-the-changed-record-from-cloud-file/m-p/51000#M28929</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;neither DLT stream, neither native Spark Structure Streaming, will not pick up a fact that record has changed. It can only read new comings data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. If you want to maintain incremental loading of data, and you want to read data which are added, remove this option from you pipeline&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;.option("readchangeFeed","true")&lt;/PRE&gt;&lt;P&gt;and check if your pipeline works fine, by adding additional file to this location:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;path="/mnt/saphana-adls-landing/saphana-adls-landing/customer_landing"&lt;/PRE&gt;&lt;P&gt;2. If you don't care about incremental loading of data, but you care about data being changed, you can do the full reload, by changing&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")&lt;/PRE&gt;&lt;P&gt;to:&lt;/P&gt;&lt;P&gt;spark.read.csv()&lt;/P&gt;&lt;P&gt;3. There is also something called Change Data Feed, but its more advance, and I dont think that its what you are looking for. You are read more about it here:&amp;nbsp;&lt;A href="https://docs.databricks.com/en/delta/delta-change-data-feed.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/delta/delta-change-data-feed.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Good Luck&lt;/P&gt;</description>
      <pubDate>Sun, 12 Nov 2023 20:20:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-not-reading-the-changed-record-from-cloud-file/m-p/51000#M28929</guid>
      <dc:creator>Emil_Kaminski</dc:creator>
      <dc:date>2023-11-12T20:20:53Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Table - not reading the changed record from cloud file</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-not-reading-the-changed-record-from-cloud-file/m-p/51019#M28938</link>
      <description>&lt;P&gt;Thank you Emil. I tried all the suggestions. .read works fine it picks up the new data or changed data. but my problem is it is bronze table&amp;nbsp; as target.&amp;nbsp;&lt;/P&gt;&lt;P&gt;in this case my bronze table has duplicate records.&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, let me look at the other options to create another intermediate table and apply the CDC.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Nov 2023 03:06:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-not-reading-the-changed-record-from-cloud-file/m-p/51019#M28938</guid>
      <dc:creator>alj_a</dc:creator>
      <dc:date>2023-11-13T03:06:29Z</dc:date>
    </item>
  </channel>
</rss>

