- 1073 Views
- 0 replies
- 1 kudos
Hi all, What is the general guideline for handling flatfiles (xml, json with several nested hierarchies that is also schema evolving) in the bronze layer?Should I persist the file content into a single column as text in the parquet file or should I l...
- 1073 Views
- 0 replies
- 1 kudos
by
Kash
• Contributor III
- 18666 Views
- 18 replies
- 13 kudos
Hi there,I was wondering if I could get your advise.We would like to create a bronze delta table using GZ JSON data stored in S3 but each time we attempt to read and write it our clusters CPU spikes to 100%. We are not doing any transformations but s...
- 18666 Views
- 18 replies
- 13 kudos
Latest Reply
Hi Kaniz,Thanks for the note and thank you everyone for the suggestions and help. @Joseph Kambourakis I aded your suggestion to our load but I did not see any change in how our data loads or the time it takes to load data. I've done some additional ...
17 More Replies
- 1622 Views
- 1 replies
- 0 kudos
We have a structured streaming job configured to read from event-hub and persist to the delta raw/bronze layer via MERGE inside a foreachBatch, However of-late, the merge process is taking longer time. How can i optimize this pipeline ?
- 1622 Views
- 1 replies
- 0 kudos
Latest Reply
Delta Lake completes a MERGE in two stepsPerform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in the target and source tables and write out the update...