Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2024 04:04 PM
I read a .zip file in Spark and get unreadable data when I run show() on the data frame.
When I check the number of partitions using df.rdd.getNumPartitions(), I get 8 (the number of cores I am using). Shouldn't the partition count be just 1 as I read a non-splittable/compressed file?
When I was using only 1 core, then I had got only 1 partition though.