Optimizing Shuffle Partition Size in Spark for Large Joins
I am working on a Spark join between two tables of sizes 300 GB and 5 GB, respectively. After analyzing the Spark UI, I noticed the following:- The average shuffle write partition size for th...