<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes' in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13285#M7989</link>
    <description>&lt;P&gt;Hey there Community!!  I'm using dlt.apply_changes in my DLT job as follows:&lt;/P&gt;&lt;P&gt;dlt.apply_changes(&lt;/P&gt;&lt;P&gt;&amp;nbsp;target = "employee_silver",&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;source = "employee_bronze_clean_v",&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;keys = ["EMPLOYEE_ID"],&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;sequence_by = col("last_updated"),&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;apply_as_deletes = expr("Op = 'D'"),&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;except_column_list = ["Op", "_rescued_data"])&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The apply_as_deletes line of code isn't working.  I'm using AWS Database Migration Service and in "ongoing replication mode", it creates csv files in S3 with a column Op and the values are one of I / U / D (Insert, Update, Delete respectively). I can't figure out why my delete transaction isn't being processed by this pipeline.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any insight is GREATLY appreciated!  thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 17 Jul 2022 13:53:46 GMT</pubDate>
    <dc:creator>BradSheridan</dc:creator>
    <dc:date>2022-07-17T13:53:46Z</dc:date>
    <item>
      <title>CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'</title>
      <link>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13285#M7989</link>
      <description>&lt;P&gt;Hey there Community!!  I'm using dlt.apply_changes in my DLT job as follows:&lt;/P&gt;&lt;P&gt;dlt.apply_changes(&lt;/P&gt;&lt;P&gt;&amp;nbsp;target = "employee_silver",&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;source = "employee_bronze_clean_v",&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;keys = ["EMPLOYEE_ID"],&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;sequence_by = col("last_updated"),&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;apply_as_deletes = expr("Op = 'D'"),&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;except_column_list = ["Op", "_rescued_data"])&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The apply_as_deletes line of code isn't working.  I'm using AWS Database Migration Service and in "ongoing replication mode", it creates csv files in S3 with a column Op and the values are one of I / U / D (Insert, Update, Delete respectively). I can't figure out why my delete transaction isn't being processed by this pipeline.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any insight is GREATLY appreciated!  thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 17 Jul 2022 13:53:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13285#M7989</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-07-17T13:53:46Z</dc:date>
    </item>
    <item>
      <title>Re: CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'</title>
      <link>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13286#M7990</link>
      <description>&lt;P&gt;First try expr("Operation = 'DELETE'") for your &lt;B&gt;apply_as_deletes&lt;/B&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 17 Jul 2022 15:09:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13286#M7990</guid>
      <dc:creator>axb0</dc:creator>
      <dc:date>2022-07-17T15:09:59Z</dc:date>
    </item>
    <item>
      <title>Re: CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'</title>
      <link>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13287#M7991</link>
      <description>&lt;P&gt;ok....will give it a try momentarily and report back here&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 17 Jul 2022 15:10:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13287#M7991</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-07-17T15:10:57Z</dc:date>
    </item>
    <item>
      <title>Re: CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'</title>
      <link>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13288#M7992</link>
      <description>&lt;P&gt;@Alex Barreto​&amp;nbsp;i gave that a shot and it errored out saying no column DELETE exists. Here is the CDC file that I'm loading (fake data of course):&lt;/P&gt;&lt;P&gt;Op,EMPLOYEE_ID,FIRST_NAME,LAST_NAME,EMAIL,PHONE_NUMBER,HIRE_DATE,JOB_ID,SALARY,COMMISSION_PCT,MANAGER_ID,DEPARTMENT_ID,last_updated&lt;/P&gt;&lt;P&gt;D,206,William,Gietz,WGIETZ,515.123.8181,2002-06-07,AC_ACCOUNT,8300,,205,110,2021-10-06 21:26:04&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So I have to use expr("Op = 'D'") and that's what isn't working.  When I use Databricks SQL to query my Bronze table, I see 2 rows for this primary key (206)...one with an 'I' (for this initial insert) and one with a 'D' (for the deleted transaction from the source (MySQL Aurora). But it's that D row that isn't flowing into my Silver table&lt;/P&gt;</description>
      <pubDate>Sun, 17 Jul 2022 17:08:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13288#M7992</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-07-17T17:08:10Z</dc:date>
    </item>
    <item>
      <title>Re: CDC with Delta Live Tables, with AutoLoader, isn't applying 'deletes'</title>
      <link>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13289#M7993</link>
      <description>&lt;P&gt;@Alex Barreto​&amp;nbsp;Think I got it now!! I was originally using &amp;nbsp;sequence_by = col("createDate") in my apply_changes code, which comes from the source table.  However, I realized that I have an intermediate dlt.view function to do some rudimentary data quality checks (dlt.expect) and in that function, I also do this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;def employee_bronze_clean_v():&lt;/P&gt;&lt;P&gt;&amp;nbsp;return dlt.read_stream("employee_bronze") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;.withColumn('inputFileName',F.input_file_name()) \&lt;/P&gt;&lt;P&gt;&amp;nbsp;.withColumn('LoadDate',F.lit(datetime.now()))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I then used LoadDate in my sequence_by = col("LoadDate") in the dlt.apply_changes and presto....the row is gone from my Silver table!!!&lt;/P&gt;&lt;P&gt;But is it gone?  is it still available in time travel? does it still exist in the parquet file?  I worry about this for the privacy laws like GDPR, CCPA, etc...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;p.s. thanks again for helping out!&lt;/P&gt;</description>
      <pubDate>Sun, 17 Jul 2022 17:17:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cdc-with-delta-live-tables-with-autoloader-isn-t-applying/m-p/13289#M7993</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-07-17T17:17:31Z</dc:date>
    </item>
  </channel>
</rss>

