cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Kash
by Contributor III
  • 6048 Views
  • 19 replies
  • 13 kudos

Resolved! HELP! Converting GZ JSON to Delta causes massive CPU spikes and ETL's take days!

Hi there,I was wondering if I could get your advise.We would like to create a bronze delta table using GZ JSON data stored in S3 but each time we attempt to read and write it our clusters CPU spikes to 100%. We are not doing any transformations but s...

  • 6048 Views
  • 19 replies
  • 13 kudos
Latest Reply
Kash
Contributor III
  • 13 kudos

Hi Kaniz,Thanks for the note and thank you everyone for the suggestions and help. @Joseph Kambourakis​ I aded your suggestion to our load but I did not see any change in how our data loads or the time it takes to load data. I've done some additional ...

  • 13 kudos
18 More Replies
User16826992666
by Valued Contributor
  • 726 Views
  • 1 replies
  • 0 kudos

Resolved! I know my partitions are skewed, is there anything I can do to help my performance?

I know the skew in my dataset has the potential to cause issues with my job performance, so just wondering if there is anything I can do to help my performance other than repartitioning the whole dataset.

  • 726 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

For scenarios like this, it is recommend to use a cluster with Databricks Runtime 7.3 LTS or above where AQE is enabled. AQE dynamically handles skew in sort merge join and shuffle hash join by splitting (and replicating if needed) skewed tasks into ...

  • 0 kudos
User16826992666
by Valued Contributor
  • 718 Views
  • 1 replies
  • 0 kudos

Resolved! Do I still need to use skew join hints if I have Adaptive Query Execution enabled?

From what I have read about AQE it seems to do a lot of what skew join hints did automatically. So should I still be using skew hints in my queries? Is there harm in using them?

  • 718 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

With AQE Databricks  has the most up-to-date accurate statistics at the end of a query stage and can opt for a better physical strategy and or do optimizations that used to require hints,In the case of skew join hints, is recommended to rely on AQE...

  • 0 kudos
Labels