Data Engineering

Forum Posts

Sorted by:

by Kash • Contributor III

06-09-2022 6:49:15 AM

18488 Views
18 replies
13 kudos

Resolved! HELP! Converting GZ JSON to Delta causes massive CPU spikes and ETL's take days!

Hi there,I was wondering if I could get your advise.We would like to create a bronze delta table using GZ JSON data stored in S3 but each time we attempt to read and write it our clusters CPU spikes to 100%. We are not doing any transformations but s...

Data Engineering

18488 Views
18 replies
13 kudos

06-09-2022 6:49:15 AM

View Replies

Latest Reply

Kash
Contributor III

06-15-2022 5:47:02 AM

13 kudos

Hi Kaniz,Thanks for the note and thank you everyone for the suggestions and help. @Joseph Kambourakis I aded your suggestion to our load but I did not see any change in how our data loads or the time it takes to load data. I've done some additional ...

13 kudos

06-15-2022 5:47:02 AM

17 More Replies

by User16826992666 • Valued Contributor

06-16-2021 8:41:10 PM

1614 Views
1 replies
0 kudos

Resolved! I know my partitions are skewed, is there anything I can do to help my performance?

I know the skew in my dataset has the potential to cause issues with my job performance, so just wondering if there is anything I can do to help my performance other than repartitioning the whole dataset.

Data Engineering

1614 Views
1 replies
0 kudos

06-16-2021 8:41:10 PM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-17-2021 11:16:06 PM

0 kudos

For scenarios like this, it is recommend to use a cluster with Databricks Runtime 7.3 LTS or above where AQE is enabled. AQE dynamically handles skew in sort merge join and shuffle hash join by splitting (and replicating if needed) skewed tasks into ...

0 kudos

06-17-2021 11:16:06 PM

by User16826992666 • Valued Contributor

06-16-2021 8:59:37 PM

1435 Views
1 replies
0 kudos

Resolved! Do I still need to use skew join hints if I have Adaptive Query Execution enabled?

From what I have read about AQE it seems to do a lot of what skew join hints did automatically. So should I still be using skew hints in my queries? Is there harm in using them?

Data Engineering

1435 Views
1 replies
0 kudos

06-16-2021 8:59:37 PM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-17-2021 11:13:31 PM

0 kudos

With AQE Databricks has the most up-to-date accurate statistics at the end of a query stage and can opt for a better physical strategy and or do optimizations that used to require hints,In the case of skew join hints, is recommended to rely on AQE...

0 kudos

06-17-2021 11:13:31 PM

Databricks Community

Resolved! HELP! Converting GZ JSON to Delta causes massive CPU spikes and ETL's take days!

Resolved! I know my partitions are skewed, is there anything I can do to help my performance?

Resolved! Do I still need to use skew join hints if I have Adaptive Query Execution enabled?