Hello Community,
Today I was in Tips and Tricks - Optimizations webinar and I started being confused, they said:
"Don't partition tables <1TB in size and plan carefully when partitioning
• Partitions should be >=1GB"
Now my confusion is if this recommendation is given for storing the data on disk while writing it at the end of the spark job? or does this apply when we are running a job and we are doing some transformations and we want to split this data into more executors so that this run faster? I mean if I want to partition a table that is close to 1TB to make the job faster by splitting the same let's say in 10 executors should I not do this?
Thank you in advance for the clarification.
Thanks
#dataengineering