<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Update target column with optional source columns in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/update-target-column-with-optional-source-columns/m-p/111082#M43783</link>
    <description>&lt;P&gt;Hi Ruc&lt;/P&gt;&lt;P&gt;How are you doing today?, as per my understanding,&amp;nbsp;Since your source table has a ReqDetails column with dynamic key-value pairs and only ProductID is fixed, you can parse the ReqDetails column dynamically and merge it into the target table using PySpark and Delta Lake. First, read the source data into a DataFrame and use a UDF (User-Defined Function) to split ReqDetails into a dictionary of key-value pairs. Then, extract only the available keys (e.g., ProductName, Price, ProductParentgroup) and create new columns dynamically. Finally, use Delta Lake’s MERGE operation to update only the modified values in the target table, using COALESCE() to retain existing data when a column is missing in the request. This way, you avoid a rigid schema while ensuring that updates happen correctly. Hope this helps!&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
    <pubDate>Mon, 24 Feb 2025 20:21:14 GMT</pubDate>
    <dc:creator>Brahmareddy</dc:creator>
    <dc:date>2025-02-24T20:21:14Z</dc:date>
    <item>
      <title>Update target column with optional source columns</title>
      <link>https://community.databricks.com/t5/data-engineering/update-target-column-with-optional-source-columns/m-p/110535#M43601</link>
      <description>&lt;P&gt;I am trying to load on prem transaction log table and update a databricks table.In Source table,the column reqdetails table hold all the information and expect ProductID remaining columns are dynamic(not all columns exists in the request) all time the optional/nullable.Could you please let me the process using python in databricks ? Explored StructType but it needs mandatory specific columns.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Source Table&lt;/STRONG&gt;&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="25%" height="30px"&gt;Req ID&lt;/TD&gt;&lt;TD width="25%" height="30px"&gt;Type&lt;/TD&gt;&lt;TD width="25%" height="30px"&gt;Req Details&lt;/TD&gt;&lt;TD width="25%" height="30px"&gt;Status&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%" height="30px"&gt;1&lt;/TD&gt;&lt;TD width="25%" height="30px"&gt;Update&lt;/TD&gt;&lt;TD width="25%" height="30px"&gt;ProductID=234;ProductName=LawnMover;Price=58&lt;/TD&gt;&lt;TD width="25%" height="30px"&gt;True&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%" height="45px"&gt;2&lt;/TD&gt;&lt;TD width="25%" height="45px"&gt;Update&lt;/TD&gt;&lt;TD width="25%" height="45px"&gt;ProductID=874;Price=478&lt;/TD&gt;&lt;TD width="25%" height="45px"&gt;True&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%" height="19px"&gt;3&lt;/TD&gt;&lt;TD width="25%" height="19px"&gt;Update&lt;/TD&gt;&lt;TD width="25%" height="19px"&gt;ProductID=678;ProductParentgroup=Watersuppuly;Price=1.6&lt;/TD&gt;&lt;TD width="25%" height="19px"&gt;True&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;STRONG&gt;Target table before Update&lt;/STRONG&gt;&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="25%"&gt;ProductID&lt;/TD&gt;&lt;TD width="25%"&gt;ProductParentgroup&lt;/TD&gt;&lt;TD width="25%"&gt;ProductName&lt;/TD&gt;&lt;TD width="25%"&gt;Price&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%"&gt;234&lt;/TD&gt;&lt;TD width="25%"&gt;Utility&lt;/TD&gt;&lt;TD width="25%"&gt;Mover&lt;/TD&gt;&lt;TD width="25%"&gt;86&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%"&gt;874&lt;/TD&gt;&lt;TD width="25%"&gt;HOA&lt;/TD&gt;&lt;TD width="25%"&gt;Sink&lt;/TD&gt;&lt;TD width="25%"&gt;450&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%"&gt;678&lt;/TD&gt;&lt;TD width="25%"&gt;Water&lt;/TD&gt;&lt;TD width="25%"&gt;Filters&lt;/TD&gt;&lt;TD width="25%"&gt;1.2&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;STRONG&gt;Target table after Update&lt;/STRONG&gt;&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="25%"&gt;ProductID&lt;/TD&gt;&lt;TD width="25%"&gt;ProductParentgroup&lt;/TD&gt;&lt;TD width="25%"&gt;ProductName&lt;/TD&gt;&lt;TD width="25%"&gt;Price&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%"&gt;234&lt;/TD&gt;&lt;TD width="25%"&gt;Utility&lt;/TD&gt;&lt;TD width="25%"&gt;LawnMover&lt;/TD&gt;&lt;TD width="25%"&gt;58&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%"&gt;874&lt;/TD&gt;&lt;TD width="25%"&gt;HOA&lt;/TD&gt;&lt;TD width="25%"&gt;Sink&lt;/TD&gt;&lt;TD width="25%"&gt;478&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%"&gt;678&lt;/TD&gt;&lt;TD width="25%"&gt;Watersupply&lt;/TD&gt;&lt;TD width="25%"&gt;Filters&lt;/TD&gt;&lt;TD width="25%"&gt;1.6&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Feb 2025 21:33:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/update-target-column-with-optional-source-columns/m-p/110535#M43601</guid>
      <dc:creator>RUC</dc:creator>
      <dc:date>2025-02-18T21:33:47Z</dc:date>
    </item>
    <item>
      <title>Re: Update target column with optional source columns</title>
      <link>https://community.databricks.com/t5/data-engineering/update-target-column-with-optional-source-columns/m-p/111082#M43783</link>
      <description>&lt;P&gt;Hi Ruc&lt;/P&gt;&lt;P&gt;How are you doing today?, as per my understanding,&amp;nbsp;Since your source table has a ReqDetails column with dynamic key-value pairs and only ProductID is fixed, you can parse the ReqDetails column dynamically and merge it into the target table using PySpark and Delta Lake. First, read the source data into a DataFrame and use a UDF (User-Defined Function) to split ReqDetails into a dictionary of key-value pairs. Then, extract only the available keys (e.g., ProductName, Price, ProductParentgroup) and create new columns dynamically. Finally, use Delta Lake’s MERGE operation to update only the modified values in the target table, using COALESCE() to retain existing data when a column is missing in the request. This way, you avoid a rigid schema while ensuring that updates happen correctly. Hope this helps!&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
      <pubDate>Mon, 24 Feb 2025 20:21:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/update-target-column-with-optional-source-columns/m-p/111082#M43783</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2025-02-24T20:21:14Z</dc:date>
    </item>
  </channel>
</rss>

