Re: Data loss after writing a transformed pyspark ...

datastones · ‎09-26-2024

1 interesting thing that I observed when playing around with different repartition size: 2,4,8,16,32,64. i have 32 cores per node. The total written result increased up to 65% of the final df at repartition = 8 (but this is very inconsistent and hard to reproduce...)