topic Re: How do I choose which column to partition by? in Data Engineering

How do I choose which column to partition by?

User16826992666 — Thu, 24 Jun 2021 22:06:12 GMT

I am in the process of building my data pipeline, but I am unsure of how to choose which fields in my data I should use for partitioning. What should I be considering when choosing a partitioning strategy?

Re: How do I choose which column to partition by?

brickster_2018 — Thu, 24 Jun 2021 23:22:00 GMT

The important factors deciding partition columns are:

Even distribution of data.
Choose the column that is commonly or widely accessed or queried.
Do not create multiple levels of partition, as you can end up with a large number of small files.