XML to Parquet files
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2024 09:01 PM
I have a requirement where I need to ingest large xml files and flatten the data before saving it as parquet files. I have created a python function to flatten the complex types (array & struct) from the ingested xml dataframe. I'm using the spark-xml library for reading the files. My concern is this is consuming a lot of time (> 1hr) for the ingestion and flattening. Any way I can do it more efficiently?