cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What are the recommended practices for handling skewed datasets in Databricks?

Suheb
New Contributor III

What should you do when your dataset is unevenโ€”some values appear too many times and others appear very few timesโ€”while working in Databricks?

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @Suheb ,

Refer to really good guide prepared by Databricks team. When you have a skewed dataset the primary things you can do are following:

1. Filter skewed values

2. Apply Skew hints

3. AQE skew optimization

4. Salting

Much detailed description of above terms can be found in below guide:

Comprehensive Guide to Optimize Data Workloads | Databricks