cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How do I choose which column to partition by?

User16826992666
Valued Contributor

I am in the process of building my data pipeline, but I am unsure of how to choose which fields in my data I should use for partitioning. What should I be considering when choosing a partitioning strategy?

1 REPLY 1

User16869510359
Esteemed Contributor

The important factors deciding partition columns are:

  • Even distribution of data.
  • Choose the column that is commonly or widely accessed or queried.
  • Do not create multiple levels of partition, as you can end up with a large number of small files.