<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta Table Upsert fails when source attributes are missing in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-table-upsert-fails-when-source-attributes-are-missing/m-p/91733#M38248</link>
    <description>&lt;P&gt;How do I set&amp;nbsp;&lt;STRONG&gt;Existing Optional Columns?&lt;/STRONG&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 25 Sep 2024 13:43:44 GMT</pubDate>
    <dc:creator>JohnM256</dc:creator>
    <dc:date>2024-09-25T13:43:44Z</dc:date>
    <item>
      <title>Delta Table Upsert fails when source attributes are missing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-upsert-fails-when-source-attributes-are-missing/m-p/60448#M31671</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I am trying to merge a json to delta table. Since the Json is basically from MongoDB which does not have a schema, there are chances of having missing attributes expected by delta table schema validation. Schema Evolution is enabled as well. How do i handle this ? Is there a way to add these missing attributes and default values or disable schema validation for such cases. I am using Delta table framework for scala.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 16 Feb 2024 22:39:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-upsert-fails-when-source-attributes-are-missing/m-p/60448#M31671</guid>
      <dc:creator>MustangR</dc:creator>
      <dc:date>2024-02-16T22:39:51Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Table Upsert fails when source attributes are missing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-upsert-fails-when-source-attributes-are-missing/m-p/60955#M31699</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99966"&gt;@MustangR&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for bringing up your concern,&amp;nbsp;&lt;SPAN&gt;You're right,&lt;/SPAN&gt;&lt;SPAN&gt; merging JSON data from MongoDB with a Delta table can pose challenges due to schema differences.&lt;/SPAN&gt;&lt;SPAN&gt; Here are two ways which I could think of to handle this situation:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1. Leveraging Schema Evolution:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Automatic Schema Evolution:&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;When schema evolution is enabled,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;Delta table allows adding new columns during writes.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;However,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;it treats missing&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;required&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;columns differently:&lt;/SPAN&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Schema on Read vs. Write:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;If the missing column is present in the Delta table schema used for reading but absent in the write schema,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;the merge will&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;fail&lt;/STRONG&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;New Required Column:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;If the missing column is a newly introduced requirement in the Delta table schema,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;the merge will&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;fail&lt;/STRONG&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class="lia-indent-padding-left-30px"&gt;How to solve this problem:&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Existing Optional Columns:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;Missing optional columns will have NULL values during merge,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;no further action needed.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;New Optional Columns:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;If you anticipate encountering new optional columns in the JSON data,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;add them to your Delta table schema with NULL default values.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;This allows automatic handling by schema evolution.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Required Columns Present but not in Write Schema:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;Modify your write process to include the required column from the Delta table schema.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Required Columns Newly Introduced:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;If the requirement is recent,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;consider adding the column to the Delta table schema with a suitable default value to allow future merges.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;You can then backfill existing data if needed.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;2. Preprocessing JSON Data:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Identify Missing Attributes:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;Before merging,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;analyze your MongoDB JSON data to identify potentially missing attributes compared to the Delta table schema.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Add Missing Attributes:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;Add missing attributes to the JSON data with default values before writing them to Delta.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;You can use libraries like&amp;nbsp;&lt;/SPAN&gt;spark-sql-catalyst&lt;SPAN&gt;&amp;nbsp;to perform schema-aware transformations.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Handle Schema Violations:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;If you cannot add missing attributes,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;you can create a separate "error" table to store rows that violate the Delta table schema.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Feb 2024 18:38:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-upsert-fails-when-source-attributes-are-missing/m-p/60955#M31699</guid>
      <dc:creator>Palash01</dc:creator>
      <dc:date>2024-02-17T18:38:24Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Table Upsert fails when source attributes are missing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-upsert-fails-when-source-attributes-are-missing/m-p/91733#M38248</link>
      <description>&lt;P&gt;How do I set&amp;nbsp;&lt;STRONG&gt;Existing Optional Columns?&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Sep 2024 13:43:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-upsert-fails-when-source-attributes-are-missing/m-p/91733#M38248</guid>
      <dc:creator>JohnM256</dc:creator>
      <dc:date>2024-09-25T13:43:44Z</dc:date>
    </item>
  </channel>
</rss>

