Data Engineering

Forum Posts

Sorted by:

by Anonymous • Not applicable

06-18-2021 2:12:44 PM

26059 Views
7 replies
0 kudos

Resolved! Tuning shuffle partitions

Is the best practice for tuning shuffle partitions to have the config "autoOptimizeShuffle.enabled" on? I see it is not switched on by default. Why is that?

Data Engineering

26059 Views
7 replies
0 kudos

06-18-2021 2:12:44 PM

View Replies

Latest Reply

mtajmouati
Contributor

07-03-2024 1:11:51 PM

0 kudos

AQE applies to all queries that are:Non-streamingContain at least one exchange (usually when there’s a join, aggregate, or window), one sub-query, or both.Not all AQE-applied queries are necessarily re-optimized. The re-optimization might or might no...

0 kudos

07-03-2024 1:11:51 PM

6 More Replies

by Ankith • New Contributor

05-11-2023 4:51:55 AM

4368 Views
2 replies
1 kudos

Resolved! How to enable spark.shuffle.compress in spark 3.3.0 or above versions?

when I try to set that I got a following error, appreciate your comments, thanks you in advance.

Data Engineering

4368 Views
2 replies
1 kudos

05-11-2023 4:51:55 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-21-2023 11:57:16 PM

1 kudos

Hi @Ankith Patlolla Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

1 kudos

05-21-2023 11:57:16 PM

1 More Replies

by alejandrofm • Valued Contributor

04-20-2023 5:44:19 AM

4208 Views
2 replies
2 kudos

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

Hi! I'm optimizing several Tb of partitioned data on ZSTD lvl 9.It surprises me the level of shuffle write, it could make sense because of ZORDER but I want to be sure that I'm not missing something, here is some context: Could I be missing something...

Data Engineering

4208 Views
2 replies
2 kudos

04-20-2023 5:44:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-23-2023 8:05:20 AM

2 kudos

Hi @Alejandro Martinez Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

2 kudos

04-23-2023 8:05:20 AM

1 More Replies

by user_b22ce5eeAl • New Contributor II

08-13-2021 7:07:18 AM

1983 Views
2 replies
0 kudos

pandas udf type grouped map fails

Hello, I am trying to get the shap values for my whole dataset using pandas udf for each category of a categorical variable. It runs well when I run it on a few categories but when I want to run the function on the whole dataset my job fails. I see ...

Data Engineering

1983 Views
2 replies
0 kudos

08-13-2021 7:07:18 AM

View Replies

Latest Reply

Jackson
New Contributor II

08-16-2021 9:01:03 PM

0 kudos

I want to use data.groupby.apply() to apply a function to each row of my Pyspark Dataframe per group.I used The Grouped Map Pandas UDFs. However I can't figure out how to add another argument to my function. DGCustomerFirst SurveyI tried using the ar...

0 kudos

08-16-2021 9:01:03 PM

1 More Replies

Databricks Community

Resolved! Tuning shuffle partitions

Resolved! How to enable spark.shuffle.compress in spark 3.3.0 or above versions?

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

pandas udf type grouped map fails