Data Engineering

Forum Posts

Sorted by:

by 7effrey • New Contributor III

11-08-2022 5:32:52 AM

1419 Views
0 replies
1 kudos

Flatfiles ingestion on Bronze layer, 'to schema' or 'not to schemarize'?

Hi all, What is the general guideline for handling flatfiles (xml, json with several nested hierarchies that is also schema evolving) in the bronze layer?Should I persist the file content into a single column as text in the parquet file or should I l...

Data Engineering

1419 Views
0 replies
1 kudos

11-08-2022 5:32:52 AM

by Kash • Contributor III

06-09-2022 6:49:15 AM

22057 Views
18 replies
13 kudos

Resolved! HELP! Converting GZ JSON to Delta causes massive CPU spikes and ETL's take days!

Hi there,I was wondering if I could get your advise.We would like to create a bronze delta table using GZ JSON data stored in S3 but each time we attempt to read and write it our clusters CPU spikes to 100%. We are not doing any transformations but s...

Data Engineering

22057 Views
18 replies
13 kudos

06-09-2022 6:49:15 AM

View Replies

Latest Reply

Kash
Contributor III

06-15-2022 5:47:02 AM

13 kudos

Hi Kaniz,Thanks for the note and thank you everyone for the suggestions and help. @Joseph Kambourakis I aded your suggestion to our load but I did not see any change in how our data loads or the time it takes to load data. I've done some additional ...

13 kudos

06-15-2022 5:47:02 AM

17 More Replies

by sajith_appukutt • Databricks Employee

06-13-2021 4:55:00 PM

2030 Views
1 replies
0 kudos

Resolved! MERGE operation on PI data getting slower. How can I debug?

We have a structured streaming job configured to read from event-hub and persist to the delta raw/bronze layer via MERGE inside a foreachBatch, However of-late, the merge process is taking longer time. How can i optimize this pipeline ?

Data Engineering

2030 Views
1 replies
0 kudos

06-13-2021 4:55:00 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-21-2021 3:08:55 PM

0 kudos

Delta Lake completes a MERGE in two stepsPerform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in the target and source tables and write out the update...

0 kudos

06-21-2021 3:08:55 PM

Databricks Community

Flatfiles ingestion on Bronze layer, 'to schema' or 'not to schemarize'?

Resolved! HELP! Converting GZ JSON to Delta causes massive CPU spikes and ETL's take days!

Resolved! MERGE operation on PI data getting slower. How can I debug?