Topics with Label: MERGE Performance

Forum Posts

Sorted by:

by ron_lusha • New Contributor

06-04-2023 6:27:32 AM

563 Views
1 replies
0 kudos

How can I know if databricks auto-detected to use tuneFileSizesForRewrites?

We are having some issues with merge performance, so I went and read a bit in the documentation, I found this section:https://docs.databricks.com/delta/tune-file-size.html#autotune-file-size-based-on-workload"Databricks recommends setting the table p...

Data Engineering

563 Views
1 replies
0 kudos

06-04-2023 6:27:32 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 3:02:16 AM

0 kudos

Hi @Ron Serruya Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

0 kudos

06-17-2023 3:02:16 AM

by ros • New Contributor III

05-31-2023 12:47:59 AM

635 Views
2 replies
2 kudos

merge vs MERGE INTO

from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...

Data Engineering

635 Views
2 replies
2 kudos

05-31-2023 12:47:59 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 12:10:35 AM

2 kudos

Hi @Roshan RC Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

2 kudos

06-01-2023 12:10:35 AM

1 More Replies

by JordanYaker • Contributor

02-24-2023 8:28:53 AM

1921 Views
8 replies
1 kudos

Why is Delta Lake creating a 238.0TiB shuffle on merge?

I'm frankly at a loss here. I have a task that is consistently performing just awfully. I took some time this morning to try and debug it and the physical plan is showing a 238TiB shuffle:== Physical Plan == AdaptiveSparkPlan (40) +- == Current Plan...

Data Engineering

1921 Views
8 replies
1 kudos

02-24-2023 8:28:53 AM

View Replies

Latest Reply

Vartika
Moderator

04-25-2023 4:01:19 AM

1 kudos

Hi @Jordan Yaker,Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

1 kudos

04-25-2023 4:01:19 AM

7 More Replies

by lawrence009 • Contributor

12-20-2022 2:04:13 PM

653 Views
2 replies
3 kudos

Advice on efficiently cleansing and transforming delta table

I have a delta table that is being updated nightly using Auto Loader. After the merge, the job kicks off a second notebook to clean and rewrite certain value using a series of UPDATE statements, e.g.,UPDATE TABLE foo SET field1 = some_value WHER...

Data Engineering

653 Views
2 replies
3 kudos

12-20-2022 2:04:13 PM

View Replies

Latest Reply

Jfoxyyc
Valued Contributor

12-28-2022 11:42:05 PM

3 kudos

I would partition the table by some sort of date that autoloader can use. You could then filter your update further and it'll automatically use partition pruning and only scan related files.

3 kudos

12-28-2022 11:42:05 PM

1 More Replies

by User16869510359 • Esteemed Contributor

06-25-2021 11:18:08 AM

634 Views
1 replies
0 kudos

Resolved! What are the general performance optimization tips for improving MERGE performance

Data Engineering

634 Views
1 replies
0 kudos

06-25-2021 11:18:08 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 11:19:34 AM

0 kudos

While using MERGE INTO statement, if the source data that will be merged into the target delta table is small enough to be fit into memory of the worker nodes, then it makes sense to broadcast the source data. By doing so, the execution can avoid the...

0 kudos

06-25-2021 11:19:34 AM