-werners-
Esteemed Contributor III

That is a lot of questions in one topic.

Let's give it a try:

[1] this all depends on the values of the concerning parameters and the program you run

(think joins, unions, repartition etc)

[2] spark.default.parallelism is by default the number of cores * 2

[3] Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that makes use of the runtime statistics to choose the most efficient query execution plan.

AQE does not just decide the number of partitions.

https://spark.apache.org/docs/latest/sql-performance-tuning.html

[4] no idea, perhaps it is buffered/cached somewhere

View solution in original post