We are having some issues with merge performance, so I went and read a bit in the documentation, I found this section:https://docs.databricks.com/delta/tune-file-size.html#autotune-file-size-based-on-workload"Databricks recommends setting the table p...
from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...
Hi @Roshan RC​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...
I'm frankly at a loss here. I have a task that is consistently performing just awfully. I took some time this morning to try and debug it and the physical plan is showing a 238TiB shuffle:== Physical Plan ==
AdaptiveSparkPlan (40)
+- == Current Plan...
Hi @Jordan Yaker​,Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...
I have a delta table that is being updated nightly using Auto Loader. After the merge, the job kicks off a second notebook to clean and rewrite certain value using a series of UPDATE statements, e.g.,UPDATE TABLE foo
SET field1 = some_value
WHER...
I would partition the table by some sort of date that autoloader can use. You could then filter your update further and it'll automatically use partition pruning and only scan related files.
While using MERGE INTO statement, if the source data that will be merged into the target delta table is small enough to be fit into memory of the worker nodes, then it makes sense to broadcast the source data. By doing so, the execution can avoid the...