<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Apply change data with delete and schema evolution in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12068#M6935</link>
    <description>&lt;P&gt;please go through this documentation  &lt;A href="https://docs.delta.io/latest/api/python/index.html" target="test_blank"&gt;https://docs.delta.io/latest/api/python/index.html&lt;/A&gt;  &lt;/P&gt;</description>
    <pubDate>Thu, 01 Sep 2022 07:33:09 GMT</pubDate>
    <dc:creator>User16753725469</dc:creator>
    <dc:date>2022-09-01T07:33:09Z</dc:date>
    <item>
      <title>Apply change data with delete and schema evolution</title>
      <link>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12065#M6932</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Currently, I'm using structure streaming to insert/update/delete to a table. A row will be deleted if value in 'Operation' column is 'deleted'. Everything seems to work fine until there's a new column.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Since I don't need 'Operation' column in the target table, I use &lt;I&gt;whenMatchedUpdate(set=&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;..) &lt;/I&gt;and &lt;I&gt;whenNotMatchedInsert(values=..)&lt;/I&gt; instead of &lt;I&gt;whenMatchedUpdateAll() &lt;/I&gt;and &lt;I&gt;whenNotMatchedInsertAll()&lt;/I&gt;. However, from the document, it seems the schema evolution occurs only when there is either an&amp;nbsp;updateAll or an&amp;nbsp;insertAll or both. The 'Operation' column also can't be dropped since it's needed in merge (delete) condition.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there any way to automatically add a new column and also drop some columns before merging?&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jul 2022 11:56:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12065#M6932</guid>
      <dc:creator>noimeta</dc:creator>
      <dc:date>2022-07-28T11:56:56Z</dc:date>
    </item>
    <item>
      <title>Re: Apply change data with delete and schema evolution</title>
      <link>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12066#M6933</link>
      <description>&lt;P&gt;To help it in that case, I think I would need to see more data + sample data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can also implement live delta tables - there are new function apply_changes which can be excellent in your case &lt;A href="https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-cdc.html" target="test_blank"&gt;https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-cdc.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2022 11:16:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12066#M6933</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-07-29T11:16:18Z</dc:date>
    </item>
    <item>
      <title>Re: Apply change data with delete and schema evolution</title>
      <link>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12067#M6934</link>
      <description>&lt;P&gt;Thank you for your answer. I haven't tried delta live table yet, but it's on the future plan.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Anyway, the sample data looks something like:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;bronze table&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Screen Shot 2565-08-01 at 10.02.13"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1673i4BA0F8C0A9D80D01/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2565-08-01 at 10.02.13" alt="Screen Shot 2565-08-01 at 10.02.13" /&gt;&lt;/span&gt;silver table&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Screen Shot 2565-08-01 at 10.02.37"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1692i7C148B32C2D21946/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2565-08-01 at 10.02.37" alt="Screen Shot 2565-08-01 at 10.02.37" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then, the schema of the bronze table automatically got updated with a new column&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Screen Shot 2565-08-01 at 10.03.30"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1675iB3980E613C9E5A36/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2565-08-01 at 10.03.30" alt="Screen Shot 2565-08-01 at 10.03.30" /&gt;&lt;/span&gt;This is the result I want for the silver table&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Screen Shot 2565-08-01 at 10.03.51"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1670i1D481D4E11F349DF/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2565-08-01 at 10.03.51" alt="Screen Shot 2565-08-01 at 10.03.51" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Currently, I have to manually update the schema of the silver table. &lt;/P&gt;&lt;P&gt;If I use &lt;I&gt;whenMatchedUpdateAll()&amp;nbsp;&lt;/I&gt;and&amp;nbsp;&lt;I&gt;whenNotMatchedInsertAll(), &lt;/I&gt;the&lt;I&gt; Op &lt;/I&gt;column will be added to the silver table.&lt;/P&gt;&lt;P&gt;If I use &lt;I&gt;whenMatchedUpdate()&amp;nbsp;&lt;/I&gt;and&amp;nbsp;&lt;I&gt;whenNotMatchedInsert(), &lt;/I&gt;the column &lt;I&gt;a1&lt;/I&gt; won't be added to the table.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2022 03:10:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12067#M6934</guid>
      <dc:creator>noimeta</dc:creator>
      <dc:date>2022-08-01T03:10:14Z</dc:date>
    </item>
    <item>
      <title>Re: Apply change data with delete and schema evolution</title>
      <link>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12068#M6935</link>
      <description>&lt;P&gt;please go through this documentation  &lt;A href="https://docs.delta.io/latest/api/python/index.html" target="test_blank"&gt;https://docs.delta.io/latest/api/python/index.html&lt;/A&gt;  &lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2022 07:33:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12068#M6935</guid>
      <dc:creator>User16753725469</dc:creator>
      <dc:date>2022-09-01T07:33:09Z</dc:date>
    </item>
    <item>
      <title>Re: Apply change data with delete and schema evolution</title>
      <link>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12069#M6936</link>
      <description>&lt;P&gt;Thank you for the document. It's very helpful.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;From the doc, I thought I would be able to use&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;deltaTable = DeltaTable.replace(sparkSession)
    .tableName("testTable")
    .addColumns(df.schema)
    .execute()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;to update the schema in the code when some schema change is detected.&lt;/P&gt;&lt;P&gt;Anyway, this piece of code does replace the table, so not only the data got update, but all the data are also gone.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Do you have any other suggestion?&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2022 08:41:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/apply-change-data-with-delete-and-schema-evolution/m-p/12069#M6936</guid>
      <dc:creator>noimeta</dc:creator>
      <dc:date>2022-09-01T08:41:24Z</dc:date>
    </item>
  </channel>
</rss>

