pantelis_mare
Contributor III

Hello,

Took some more time investigating and trying @Sandeep Chandran​  idea.

I ran 4 different configurations. I have cached the update table and each time I was running a restore on the target table so the data we merge are identical.

Here are the files produced by each run on my BIGGEST partition which is the one blocking the stage:

files run 1:

spark.databricks.delta.tuneFileSizesForRewrites: false

I suppose it uses file tuning on table size

run2:

spark.databricks.delta.tuneFileSizesForRewrites: false

spark.databricks.delta.optimize.maxFileSize: 268435456

run3:

spark.databricks.delta.tuneFileSizesForRewrites: false

delta.targetFileSize = 268435456 property on target table

run4:

spark.databricks.delta.tuneFileSizesForRewrites: true

As an extra info here is the records per partition,. As you see my dataframe is highly unbalanced.

count