topic Re: Why is Delta Lake creating a 238.0TiB shuffle on merge? in Data Engineering

Why is Delta Lake creating a 238.0TiB shuffle on merge?

JordanYaker — Fri, 24 Feb 2023 16:28:53 GMT

I'm frankly at a loss here. I have a task that is consistently performing just awfully. I took some time this morning to try and debug it and the physical plan is showing a 238TiB shuffle:

== Physical Plan ==
AdaptiveSparkPlan (40)
+- == Current Plan ==
   SerializeFromObject (22)
   +- MapPartitions (21)
      +- DeserializeToObject (20)
         +- Project (19)
            +- ObjectHashAggregate (18)
               +- Exchange (17)
                  +- ObjectHashAggregate (16)
                     +- ObjectHashAggregate (15)
                        +- ShuffleQueryStage (14), Statistics(sizeInBytes=238.0 TiB)
                           +- Exchange (13)
                              +- ObjectHashAggregate (12)
                                 +- * Project (11)
                                    +- CartesianProduct Inner (10)
                                       :- * Project (5)
                                       :  +- * Filter (4)
                                       :     +- * Project (3)
                                       :        +- * ColumnarToRow (2)
                                       :           +- Scan parquet  (1)
                                       +- * Project (9)
                                          +- * Project (8)
                                             +- * ColumnarToRow (7)
                                                +- Scan parquet  (6)

I could understand this number if I was working with a lot of data. I'm not. The Cartesian Product in this query produces 125 rows as shown below so it's not my merge logic

Additionally, the output table isn't very big either; it's 15 files with no file larger than 10MB (NOTE: I could definitely do some repartitioning here to have a better setup but that's another story).

I feel like I'm at the end of my wits with this problem. Any ideas would be appreciated.

Re: Why is Delta Lake creating a 238.0TiB shuffle on merge?

Anonymous — Fri, 24 Feb 2023 18:48:25 GMT

So I'm not too sure of the problem, but I'll walk you through my thinking and ideas.

The deserialize/map/serialize is that a case class in Scala?

How big are the two tables you're joining?

Re: Why is Delta Lake creating a 238.0TiB shuffle on merge?

JordanYaker — Fri, 24 Feb 2023 18:50:28 GMT

@Joseph Kambourakis one table is 1.5MB. The other is about 80MB.

Re: Why is Delta Lake creating a 238.0TiB shuffle on merge?

Anonymous — Fri, 24 Feb 2023 18:54:41 GMT

Hmm, then it doesn't make sense that it would create much data on a shuffle or in any capacity. What does the shuffle look like in the plan? It should say data written/read in that part.

Re: Why is Delta Lake creating a 238.0TiB shuffle on merge?

JordanYaker — Fri, 24 Feb 2023 18:58:47 GMT

Not very big.

What's interesting is that this stage ran for 7hrs. And most of that is scheduler delay.

Re: Why is Delta Lake creating a 238.0TiB shuffle on merge?

Anonymous — Fri, 24 Feb 2023 19:00:42 GMT

The input size and records looks like what you'd expect from the table sizes and it's not creating 218TB thankfully. That said, I'm not exactly sure what the problem is in that stage, but there is def something going on w/ that length of time.

Re: Why is Delta Lake creating a 238.0TiB shuffle on merge?

JordanYaker — Fri, 24 Feb 2023 19:02:34 GMT

I'm honestly wondering if it's just not a trick of the logic on the merge at this point.

I tried running a join between the output files and what would be the input to my MERGE statement. I ran an explain on that query and it ends up creating a BroadcastNestedLoopJoin. More times than not, nested loop joins have bedeviled my performance. I'm going to just try splitting the merge in to two separate calls and see if that does the trick for me.

It might just be that the explain on a MERGE doesn't show this because of how merges are executed.

Re: Why is Delta Lake creating a 238.0TiB shuffle on merge?

Vartika — Tue, 25 Apr 2023 11:01:19 GMT

Hi @Jordan Yaker,

Hope all is well!

Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!

Re: Why is Delta Lake creating a 238.0TiB shuffle on merge?

JordanYaker — Tue, 25 Apr 2023 18:42:03 GMT

It turned out to be the BroadcastNestedLoopJoin. Once I reworked my logic to remove that, the performance cleared up.