Data Skewnesss

dyusuf
New Contributor II

I am trying to visualize data skewness through a simple aggregation example by performing groupby operation on a dataframe, the data is skewed highly for one customer, but yet databricks is balancing it automatically when I check spark UI. Is there any configuration I need to disable to review the skewness in spark UI?

Please clarify.

 

Thanks,

Yusuf

SantoshJoshi
New Contributor III

Hi @dyusuf ,

It could be because AQE (Adaptive Query Execution) is enabled.

...AQE, dynamically handles skew...

Please refer below link for more details:

https://docs.databricks.com/aws/en/optimizations/aqe

Can you please disable AQE and check if this works?

spark.conf.set("spark.sql.adaptive.enabled", "false")

HTH

 

dyusuf
New Contributor II

Thankyou for your response. I already tried disabling AQE, yet it doesnt work. Any other way we could see it?