Topics with Label: Shuffle

Forum Posts

Sorted by:

Start a conversation

by Ankith • New Contributor

05-11-2023 4:51:55 AM

1470 Views
2 replies
1 kudos

Resolved! How to enable spark.shuffle.compress in spark 3.3.0 or above versions?

when I try to set that I got a following error, appreciate your comments, thanks you in advance.

Data Engineering

1470 Views
2 replies
1 kudos

05-11-2023 4:51:55 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-21-2023 11:57:16 PM

1 kudos

Hi @Ankith Patlolla Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

1 kudos

05-21-2023 11:57:16 PM

1 More Replies

by alejandrofm • Valued Contributor

04-20-2023 5:44:19 AM

1655 Views
2 replies
2 kudos

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

Hi! I'm optimizing several Tb of partitioned data on ZSTD lvl 9.It surprises me the level of shuffle write, it could make sense because of ZORDER but I want to be sure that I'm not missing something, here is some context: Could I be missing something...

Data Engineering

1655 Views
2 replies
2 kudos

04-20-2023 5:44:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-23-2023 8:05:20 AM

2 kudos

Hi @Alejandro Martinez Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

2 kudos

04-23-2023 8:05:20 AM

1 More Replies

by user_b22ce5eeAl • New Contributor II

08-13-2021 7:07:18 AM

862 Views
2 replies
0 kudos

pandas udf type grouped map fails

Hello, I am trying to get the shap values for my whole dataset using pandas udf for each category of a categorical variable. It runs well when I run it on a few categories but when I want to run the function on the whole dataset my job fails. I see ...

Data Engineering

862 Views
2 replies
0 kudos

08-13-2021 7:07:18 AM

View Replies

Latest Reply

Jackson
New Contributor II

08-16-2021 9:01:03 PM

0 kudos

I want to use data.groupby.apply() to apply a function to each row of my Pyspark Dataframe per group.I used The Grouped Map Pandas UDFs. However I can't figure out how to add another argument to my function. DGCustomerFirst SurveyI tried using the ar...

0 kudos

08-16-2021 9:01:03 PM

1 More Replies

by Anonymous • Not applicable

06-18-2021 2:12:44 PM

10104 Views
1 replies
0 kudos

Resolved! Tuning shuffle partitions

Is the best practice for tuning shuffle partitions to have the config "autoOptimizeShuffle.enabled" on? I see it is not switched on by default. Why is that?

Data Engineering

10104 Views
1 replies
0 kudos

06-18-2021 2:12:44 PM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-21-2021 1:02:37 PM

0 kudos

AQE (enabled by default from 7.3 LTS + onwards) adjusts the shuffle partition number automatically at each stage of the query, based on the size of the map-side shuffle output. So as data size grows or shrinks over different stages, the task size wi...

0 kudos

06-21-2021 1:02:37 PM

Databricks

Resolved! How to enable spark.shuffle.compress in spark 3.3.0 or above versions?

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

pandas udf type grouped map fails

Resolved! Tuning shuffle partitions