Re: Partition in Spark

Personal1 · ‎10-01-2024

I read a .zip file in Spark and get unreadable data when I run show() on the data frame.

When I check the number of partitions using df.rdd.getNumPartitions(), I get 8 (the number of cores I am using). Shouldn't the partition count be just 1 as I read a non-splittable/compressed file?

When I was using only 1 core, then I had got only 1 partition though.