Autoloader Solution for Binary files
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-30-2023 02:26 AM
We have solution implemented for ingesting binary file ( .ZIP ) into delta lake, Currently we are using the below solution within our pipeline.
- Unzip the file and extract the XML file.
- Parse the XML using python libraries.
- Flatten the nested xml columns.
- Store it to delta table.
This solution is working fine for small set of files ( 25 ). When we are processing large set of files ( 650 ) it is taking more time than expected.
Would like to know if we have a better solution to speed up the process.
Few things to note about the Xml file, This is a nested XML file which is having around 600 columns.
Labels:
- Labels:
-
Autoloader
-
Binary file