XML file Load to Delta table with different fields list
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-11-2024 08:15 PM - edited 08-11-2024 08:29 PM
I there,
I have a scenario where the source XML files may have all the fields or may be 80% of fields in next run. How to we load the files in Delta tables which should handle the XML files with all field lists and also with few fields only.
In smaller field list, the problem is MERGE operation is failing as no of rows are not matching between source and target table.
Any suggestion ?
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-11-2024 10:03 PM
Hi Atul,
You can try to handle XML files with varying fields (based on multiple loads) when loading into Delta tables, ensure the Delta table can adapt to changes by enabling schema evolution. Fill in missing fields with default values (like NULL) before merging the data. This way, even if the XML file has fewer fields, the process will handle it smoothly without causing errors during the MERGE operation. Just a thought. Give a try.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-11-2024 10:46 PM
The question is how to find the missing or new fields in nested XML files?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-12-2024 12:39 AM
@Atul-Kumar Have you tried to use the native XML support (introduced in DBR 14.3) with Auto Loader + Rescue Data instead?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2024 11:04 PM
Auto Loader is not acceptable solution in my case. I tried to make an empty table using XSD file and then load the data frame. Some how it worked to meet the objective.