cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

What are the recommended practices for handling skewed datasets in Databricks?

Suheb
New Contributor III

What should you do when your dataset is uneven—some values appear too many times and others appear very few times—while working in Databricks?

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @Suheb ,

Refer to really good guide prepared by Databricks team. When you have a skewed dataset the primary things you can do are following:

1. Filter skewed values

2. Apply Skew hints

3. AQE skew optimization

4. Salting

Much detailed description of above terms can be found in below guide:

Comprehensive Guide to Optimize Data Workloads | Databricks