Update target column with optional source columns
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-18-2025 01:33 PM
I am trying to load on prem transaction log table and update a databricks table.In Source table,the column reqdetails table hold all the information and expect ProductID remaining columns are dynamic(not all columns exists in the request) all time the optional/nullable.Could you please let me the process using python in databricks ? Explored StructType but it needs mandatory specific columns.
Source Table
Req ID | Type | Req Details | Status |
1 | Update | ProductID=234;ProductName=LawnMover;Price=58 | True |
2 | Update | ProductID=874;Price=478 | True |
3 | Update | ProductID=678;ProductParentgroup=Watersuppuly;Price=1.6 | True |
Target table before Update
ProductID | ProductParentgroup | ProductName | Price |
234 | Utility | Mover | 86 |
874 | HOA | Sink | 450 |
678 | Water | Filters | 1.2 |
Target table after Update
ProductID | ProductParentgroup | ProductName | Price |
234 | Utility | LawnMover | 58 |
874 | HOA | Sink | 478 |
678 | Watersupply | Filters | 1.6 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a month ago
Hi Ruc
How are you doing today?, as per my understanding, Since your source table has a ReqDetails column with dynamic key-value pairs and only ProductID is fixed, you can parse the ReqDetails column dynamically and merge it into the target table using PySpark and Delta Lake. First, read the source data into a DataFrame and use a UDF (User-Defined Function) to split ReqDetails into a dictionary of key-value pairs. Then, extract only the available keys (e.g., ProductName, Price, ProductParentgroup) and create new columns dynamically. Finally, use Delta Lake’s MERGE operation to update only the modified values in the target table, using COALESCE() to retain existing data when a column is missing in the request. This way, you avoid a rigid schema while ensuring that updates happen correctly. Hope this helps!
Regards,
Brahma

