cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Prashant777
by New Contributor II
  • 3916 Views
  • 4 replies
  • 0 kudos

Error in SQL statement: UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same

My code:- CREATE OR REPLACE TEMPORARY VIEW preprocessed_source ASSELECT  Key_ID,  Distributor_ID,  Customer_ID,  Customer_Name,  ChannelFROM integr_masterdata.Customer_Master;-- Step 2: Perform the merge operation using the preprocessed source tableM...

  • 3916 Views
  • 4 replies
  • 0 kudos
Latest Reply
Tread
New Contributor II
  • 0 kudos

Hey as previously stated you could drop the duplicates of the columns that contain the said duplicates(code you can find online pretty easily), I have had this problem myself and it came when creating a temporary view from a dataframe, the dataframe ...

  • 0 kudos
3 More Replies
Graham
by New Contributor III
  • 4430 Views
  • 5 replies
  • 2 kudos

"MERGE" always slower than "CREATE OR REPLACE"

OverviewTo update our Data Warehouse tables, we have tried two methods: "CREATE OR REPLACE" and "MERGE". With every query we've tried, "MERGE" is slower.My question is this: Has anyone successfully gotten a "MERGE" to perform faster than a "CREATE OR...

  • 4430 Views
  • 5 replies
  • 2 kudos
Latest Reply
Manisha_Jena
New Contributor III
  • 2 kudos

Hi @Graham Can you please try Low Shuffle Merge [LSM]  and see if it helps? LSM is a new MERGE algorithm that aims to maintain the existing data organization (including z-order clustering) for unmodified data, while simultaneously improving performan...

  • 2 kudos
4 More Replies
Greg
by New Contributor III
  • 1145 Views
  • 1 replies
  • 4 kudos

How to reduce storage space consumed by delta with many updates

I have 1 delta table that I continuously append events into, and a 2nd delta table that I continuously merge into (streamed from the 1st table) that has unique ID's where properties are updated from the events (An ID represents a unique thing that ge...

  • 1145 Views
  • 1 replies
  • 4 kudos
Latest Reply
Jb11
New Contributor II
  • 4 kudos

Did you already solved this problem?

  • 4 kudos
ron_lusha
by New Contributor
  • 705 Views
  • 1 replies
  • 0 kudos

How can I know if databricks auto-detected to use tuneFileSizesForRewrites?

We are having some issues with merge performance, so I went and read a bit in the documentation, I found this section:https://docs.databricks.com/delta/tune-file-size.html#autotune-file-size-based-on-workload"Databricks recommends setting the table p...

  • 705 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ron Serruya​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
gilo12
by New Contributor III
  • 2452 Views
  • 3 replies
  • 2 kudos

merge into deletes from SOURCE

I am using the following query to make an upsert:MERGE INTO my_target_table AS target USING (SELECT MAX(__my_timestamp) AS checkpoint FROM my_source_table) AS source ON target.name = 'some_name' AND target.address = 'some_address' WHEN MATCHED AN...

  • 2452 Views
  • 3 replies
  • 2 kudos
Latest Reply
gilo12
New Contributor III
  • 2 kudos

I was using a view for my_source_table, once I changed that to be a table the issue stoped.That unblocked me, but I think Databricks has a bug with using MERGE INTO from a VIEW

  • 2 kudos
2 More Replies
ros
by New Contributor III
  • 910 Views
  • 2 replies
  • 2 kudos

merge vs MERGE INTO

from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...

  • 910 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Roshan RC​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 2 kudos
1 More Replies
Prashant777
by New Contributor II
  • 1601 Views
  • 2 replies
  • 0 kudos

UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same

My Code:-- CREATE OR REPLACE TEMPORARY VIEW preprocessed_source ASSELECT  Key_ID,  Distributor_ID,  Customer_ID,  Customer_Name,  ChannelFROM integr_masterdata.Customer_Master;-- Step 2: Perform the merge operation using the preprocessed source table...

  • 1601 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Prashant Joshi​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 0 kudos
1 More Replies
Raghu_Bindingan
by New Contributor III
  • 3523 Views
  • 2 replies
  • 0 kudos

Resolved! SQL Merge Statement not working

Hi I am trying to use the SQL Merge statement on databricksMERGE INTO targetUSING sourceON source.key = target.keyWHEN MATCHED UPDATE SET *WHEN NOT MATCHED INSERT *WHEN NOT MATCHED BY SOURCE DELETEThis is failing with the error [PARSE_SYNTAX_ERROR...

  • 3523 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raghu_Bindingan
New Contributor III
  • 0 kudos

I was missing the THEN before UPDATE, INSERT and DELETE. This keyword is missing from the documentation on Databricks https://learn.microsoft.com/en-us/azure/databricks/delta/mergeIt now works. Thanks

  • 0 kudos
1 More Replies
ghofigjong
by New Contributor
  • 3920 Views
  • 2 replies
  • 1 kudos

Resolved! How does partition pruning work on a merge into statement?

I have a delta table that is partitioned by Year, Date and month. I'm trying to merge data to this on all three partition columns + an extra column (an ID). My merge statement is below:MERGE INTO delta.<path of delta table> oldData using df newData ...

  • 3920 Views
  • 2 replies
  • 1 kudos
Latest Reply
Umesh_S
New Contributor II
  • 1 kudos

Isn't the suggested idea only filtering the input dataframe (resulting in a smaller amount of data to match across the whole delta table) rather than prune the delta table for relevant partitions to scan?

  • 1 kudos
1 More Replies
oleole
by Contributor
  • 7807 Views
  • 1 replies
  • 1 kudos

Resolved! MERGE to update a column of a table using Spark SQL

Coming from MS SQL background, I'm trying to write a query in Spark SQL that simply update a column value of table A (source table) by INNER JOINing a new table B with a filter.MS SQL query looks like this:UPDATE T SET T.OfferAmount = OSE.EndpointEve...

  • 7807 Views
  • 1 replies
  • 1 kudos
Latest Reply
oleole
Contributor
  • 1 kudos

Posting answer to my question:   MERGE into TempOffer VIEW USING OfferSeq OSE ON VIEW.OfferId = OSE.OfferID AND OSE.OfferId = 1 WHEN MATCHED THEN UPDATE set VIEW.OfferAmount = OSE.EndpointEventAmountValue;

  • 1 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 562 Views
  • 1 replies
  • 10 kudos

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete ...

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete all target rows which doesn't match any source.

Screenshot 2023-01-24 130504
  • 562 Views
  • 1 replies
  • 10 kudos
Latest Reply
jose_gonzalez
Moderator
  • 10 kudos

Thank you for sharing @Hubert Dudek​ 

  • 10 kudos
Netty
by New Contributor III
  • 3228 Views
  • 5 replies
  • 7 kudos

Resolved! What's the easiest way to upsert data into a table? (Azure ADLS Gen2)

I had been trying to upsert rows into a table in Azure Blob Storage (ADLS Gen 2) based on two partitions (sample code below). insert overwrite table new_clicks_table partition(client_id, mm_date) select click_id ,user_id ,click_timestamp_gmt ,ca...

  • 3228 Views
  • 5 replies
  • 7 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 7 kudos

Below code might help youPython- (df.write .mode("overwrite") .option("partitionOverwriteMode", "dynamic") .saveAsTable("default.people10m") )   SQL- SET spark.sql.sources.partitionOverwriteMode=dynamic; INSERT OVERWRITE TABLE default.people10m...

  • 7 kudos
4 More Replies
Merchiv
by New Contributor III
  • 2616 Views
  • 3 replies
  • 1 kudos

Resolved! How to use uuid in SQL merge into statement

I have a Merge into statement that I use to update existing entries or create new entries in a dimension table based on a natural business key.When creating new entries I would like to also create a unique uuid for that entry that I can use to crossr...

  • 2616 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

you might wanna look into an identity column, which is possible now in delta lake.https://www.databricks.com/blog/2022/08/08/identity-columns-to-generate-surrogate-keys-are-now-available-in-a-lakehouse-near-you.html

  • 1 kudos
2 More Replies
J_M_W
by Contributor
  • 1874 Views
  • 3 replies
  • 5 kudos

Resolved! Databricks is automatically creating a _apply_changes_storage table in the database when using apply_changes for Delta Live Tables

Hi there,I am using apply_changes (aka. Delta Live Tables Change Data Capture) and it works fine. However, it seems to automatically create a secondary table in the database metastore called _apply_storage_changes_{tableName}So for every table I use ...

image image
  • 1874 Views
  • 3 replies
  • 5 kudos
Latest Reply
J_M_W
Contributor
  • 5 kudos

Hi - Thanks @Hubert Dudek​ I will look into disabling access for the users!

  • 5 kudos
2 More Replies
William_Scardua
by Valued Contributor
  • 3229 Views
  • 7 replies
  • 3 kudos

uuid in Merge

Hi guys,I'm trying to use uuid in the merge but I always get an error...import uuid   ( df_events.alias("events").merge( source = df_updates.alias("updates"), condition = "events.cod = updates.cod and events.num = updates.num" ).whenMatch...

  • 3229 Views
  • 7 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @William Scardua​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 3 kudos
6 More Replies
Labels