by
Kash
• Contributor III
- 18488 Views
- 18 replies
- 13 kudos
Hi there,I was wondering if I could get your advise.We would like to create a bronze delta table using GZ JSON data stored in S3 but each time we attempt to read and write it our clusters CPU spikes to 100%. We are not doing any transformations but s...
- 18488 Views
- 18 replies
- 13 kudos
Latest Reply
Hi Kaniz,Thanks for the note and thank you everyone for the suggestions and help. @Joseph Kambourakis I aded your suggestion to our load but I did not see any change in how our data loads or the time it takes to load data. I've done some additional ...
17 More Replies
- 1614 Views
- 1 replies
- 0 kudos
I know the skew in my dataset has the potential to cause issues with my job performance, so just wondering if there is anything I can do to help my performance other than repartitioning the whole dataset.
- 1614 Views
- 1 replies
- 0 kudos
Latest Reply
For scenarios like this, it is recommend to use a cluster with Databricks Runtime 7.3 LTS or above where AQE is enabled. AQE dynamically handles skew in sort merge join and shuffle hash join by splitting (and replicating if needed) skewed tasks into ...
- 1435 Views
- 1 replies
- 0 kudos
From what I have read about AQE it seems to do a lot of what skew join hints did automatically. So should I still be using skew hints in my queries? Is there harm in using them?
- 1435 Views
- 1 replies
- 0 kudos
Latest Reply
With AQE Databricks has the most up-to-date accurate statistics at the end of a query stage and can opt for a better physical strategy and or do optimizations that used to require hints,In the case of skew join hints, is recommended to rely on AQE...