How do I choose which column to partition by?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-24-2021 03:06 PM
I am in the process of building my data pipeline, but I am unsure of how to choose which fields in my data I should use for partitioning. What should I be considering when choosing a partitioning strategy?
Labels:
- Labels:
-
Data Pipeline
-
Partition
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-24-2021 04:22 PM
The important factors deciding partition columns are:
- Even distribution of data.
- Choose the column that is commonly or widely accessed or queried.
- Do not create multiple levels of partition, as you can end up with a large number of small files.

