Data Engineering

Forum Posts

Sorted by:

by ghofigjong • New Contributor

02-27-2023 12:29:55 AM

7950 Views
4 replies
2 kudos

Resolved! How does partition pruning work on a merge into statement?

I have a delta table that is partitioned by Year, Date and month. I'm trying to merge data to this on all three partition columns + an extra column (an ID). My merge statement is below:MERGE INTO delta.<path of delta table> oldData using df newData ...

Data Engineering

7950 Views
4 replies
2 kudos

02-27-2023 12:29:55 AM

View Replies

Latest Reply

Umesh_S
New Contributor II

03-30-2023 1:24:57 PM

2 kudos

Isn't the suggested idea only filtering the input dataframe (resulting in a smaller amount of data to match across the whole delta table) rather than prune the delta table for relevant partitions to scan?

2 kudos

03-30-2023 1:24:57 PM

3 More Replies

by Prashant777 • New Contributor II

05-15-2023 12:09:12 AM

6342 Views
4 replies
0 kudos

Error in SQL statement: UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same

My code:- CREATE OR REPLACE TEMPORARY VIEW preprocessed_source ASSELECT Key_ID, Distributor_ID, Customer_ID, Customer_Name, ChannelFROM integr_masterdata.Customer_Master;-- Step 2: Perform the merge operation using the preprocessed source tableM...

Data Engineering

6342 Views
4 replies
0 kudos

05-15-2023 12:09:12 AM

View Replies

Latest Reply

Tread
New Contributor II

01-10-2024 6:56:04 AM

0 kudos

Hey as previously stated you could drop the duplicates of the columns that contain the said duplicates(code you can find online pretty easily), I have had this problem myself and it came when creating a temporary view from a dataframe, the dataframe ...

0 kudos

01-10-2024 6:56:04 AM

3 More Replies

by Graham • New Contributor III

09-16-2022 10:41:51 AM

8593 Views
5 replies
3 kudos

"MERGE" always slower than "CREATE OR REPLACE"

OverviewTo update our Data Warehouse tables, we have tried two methods: "CREATE OR REPLACE" and "MERGE". With every query we've tried, "MERGE" is slower.My question is this: Has anyone successfully gotten a "MERGE" to perform faster than a "CREATE OR...

Data Engineering

8593 Views
5 replies
3 kudos

09-16-2022 10:41:51 AM

View Replies

Latest Reply

Manisha_Jena
Databricks Employee

11-02-2023 2:18:28 AM

3 kudos

Hi @Graham Can you please try Low Shuffle Merge [LSM] and see if it helps? LSM is a new MERGE algorithm that aims to maintain the existing data organization (including z-order clustering) for unmodified data, while simultaneously improving performan...

3 kudos

11-02-2023 2:18:28 AM

4 More Replies

by Greg • New Contributor III

09-28-2021 9:58:48 AM

1775 Views
1 replies
4 kudos

How to reduce storage space consumed by delta with many updates

I have 1 delta table that I continuously append events into, and a 2nd delta table that I continuously merge into (streamed from the 1st table) that has unique ID's where properties are updated from the events (An ID represents a unique thing that ge...

Data Engineering

1775 Views
1 replies
4 kudos

09-28-2021 9:58:48 AM

View Replies

Latest Reply

Jb11
New Contributor II

10-03-2023 5:38:34 AM

4 kudos

Did you already solved this problem?

4 kudos

10-03-2023 5:38:34 AM

by ron_lusha • New Contributor

06-04-2023 6:27:32 AM

1149 Views
1 replies
0 kudos

How can I know if databricks auto-detected to use tuneFileSizesForRewrites?

We are having some issues with merge performance, so I went and read a bit in the documentation, I found this section:https://docs.databricks.com/delta/tune-file-size.html#autotune-file-size-based-on-workload"Databricks recommends setting the table p...

Data Engineering

1149 Views
1 replies
0 kudos

06-04-2023 6:27:32 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 3:02:16 AM

0 kudos

Hi @Ron Serruya Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

0 kudos

06-17-2023 3:02:16 AM

by gilo12 • New Contributor III

05-12-2023 1:42:22 PM

10163 Views
3 replies
2 kudos

merge into deletes from SOURCE

I am using the following query to make an upsert:MERGE INTO my_target_table AS target USING (SELECT MAX(__my_timestamp) AS checkpoint FROM my_source_table) AS source ON target.name = 'some_name' AND target.address = 'some_address' WHEN MATCHED AN...

Data Engineering

10163 Views
3 replies
2 kudos

05-12-2023 1:42:22 PM

View Replies

Latest Reply

gilo12
New Contributor III

05-12-2023 6:46:46 PM

2 kudos

I was using a view for my_source_table, once I changed that to be a table the issue stoped.That unblocked me, but I think Databricks has a bug with using MERGE INTO from a VIEW

2 kudos

05-12-2023 6:46:46 PM

2 More Replies

by ros • New Contributor III

05-31-2023 12:47:59 AM

2223 Views
2 replies
2 kudos

merge vs MERGE INTO

from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...

Data Engineering

2223 Views
2 replies
2 kudos

05-31-2023 12:47:59 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 12:10:35 AM

2 kudos

Hi @Roshan RC Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

2 kudos

06-01-2023 12:10:35 AM

1 More Replies

by Prashant777 • New Contributor II

05-15-2023 12:07:37 AM

3228 Views
2 replies
0 kudos

UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same

My Code:-- CREATE OR REPLACE TEMPORARY VIEW preprocessed_source ASSELECT Key_ID, Distributor_ID, Customer_ID, Customer_Name, ChannelFROM integr_masterdata.Customer_Master;-- Step 2: Perform the merge operation using the preprocessed source table...

Data Engineering

3228 Views
2 replies
0 kudos

05-15-2023 12:07:37 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-23-2023 2:20:16 AM

0 kudos

Hi @Prashant Joshi Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

0 kudos

05-23-2023 2:20:16 AM

1 More Replies

by Raghu_Bindingan • New Contributor III

04-12-2023 7:18:17 AM

13247 Views
2 replies
0 kudos

Resolved! SQL Merge Statement not working

Hi I am trying to use the SQL Merge statement on databricksMERGE INTO targetUSING sourceON source.key = target.keyWHEN MATCHED UPDATE SET *WHEN NOT MATCHED INSERT *WHEN NOT MATCHED BY SOURCE DELETEThis is failing with the error [PARSE_SYNTAX_ERROR...

Data Engineering

13247 Views
2 replies
0 kudos

04-12-2023 7:18:17 AM

View Replies

Latest Reply

Raghu_Bindingan
New Contributor III

04-13-2023 11:11:03 AM

0 kudos

I was missing the THEN before UPDATE, INSERT and DELETE. This keyword is missing from the documentation on Databricks https://learn.microsoft.com/en-us/azure/databricks/delta/mergeIt now works. Thanks

0 kudos

04-13-2023 11:11:03 AM

1 More Replies

by oleole • Contributor

03-28-2023 1:36:30 PM

12812 Views
1 replies
1 kudos

Resolved! MERGE to update a column of a table using Spark SQL

Coming from MS SQL background, I'm trying to write a query in Spark SQL that simply update a column value of table A (source table) by INNER JOINing a new table B with a filter.MS SQL query looks like this:UPDATE T SET T.OfferAmount = OSE.EndpointEve...

Data Engineering

12812 Views
1 replies
1 kudos

03-28-2023 1:36:30 PM

View Replies

Latest Reply

oleole
Contributor

03-28-2023 8:21:37 PM

1 kudos

Posting answer to my question: MERGE into TempOffer VIEW USING OfferSeq OSE ON VIEW.OfferId = OSE.OfferID AND OSE.OfferId = 1 WHEN MATCHED THEN UPDATE set VIEW.OfferAmount = OSE.EndpointEventAmountValue;

1 kudos

03-28-2023 8:21:37 PM

by Hubert-Dudek • Esteemed Contributor III

01-24-2023 10:59:39 AM

1183 Views
1 replies
10 kudos

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete ...

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete all target rows which doesn't match any source.

Data Engineering

1183 Views
1 replies
10 kudos

01-24-2023 10:59:39 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

02-24-2023 3:21:00 PM

10 kudos

Thank you for sharing @Hubert Dudek

10 kudos

02-24-2023 3:21:00 PM

by Netty • New Contributor III

12-08-2022 9:16:44 AM

5432 Views
5 replies
7 kudos

Resolved! What's the easiest way to upsert data into a table? (Azure ADLS Gen2)

I had been trying to upsert rows into a table in Azure Blob Storage (ADLS Gen 2) based on two partitions (sample code below). insert overwrite table new_clicks_table partition(client_id, mm_date) select click_id ,user_id ,click_timestamp_gmt ,ca...

Data Engineering

5432 Views
5 replies
7 kudos

12-08-2022 9:16:44 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-08-2022 11:37:54 PM

7 kudos

Below code might help youPython- (df.write .mode("overwrite") .option("partitionOverwriteMode", "dynamic") .saveAsTable("default.people10m") ) SQL- SET spark.sql.sources.partitionOverwriteMode=dynamic; INSERT OVERWRITE TABLE default.people10m...

7 kudos

12-08-2022 11:37:54 PM

4 More Replies

by Merchiv • New Contributor III

11-30-2022 2:13:17 AM

4142 Views
3 replies
1 kudos

Resolved! How to use uuid in SQL merge into statement

I have a Merge into statement that I use to update existing entries or create new entries in a dimension table based on a natural business key.When creating new entries I would like to also create a unique uuid for that entry that I can use to crossr...

Data Engineering

4142 Views
3 replies
1 kudos

11-30-2022 2:13:17 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-30-2022 5:21:02 AM

1 kudos

you might wanna look into an identity column, which is possible now in delta lake.https://www.databricks.com/blog/2022/08/08/identity-columns-to-generate-surrogate-keys-are-now-available-in-a-lakehouse-near-you.html

1 kudos

11-30-2022 5:21:02 AM

2 More Replies

by J_M_W • Contributor

10-11-2022 3:26:13 AM

3397 Views
2 replies
5 kudos

Resolved! Databricks is automatically creating a _apply_changes_storage table in the database when using apply_changes for Delta Live Tables

Hi there,I am using apply_changes (aka. Delta Live Tables Change Data Capture) and it works fine. However, it seems to automatically create a secondary table in the database metastore called _apply_storage_changes_{tableName}So for every table I use ...

Data Engineering

3397 Views
2 replies
5 kudos

10-11-2022 3:26:13 AM

View Replies

Latest Reply

J_M_W
Contributor

11-18-2022 6:56:48 AM

5 kudos

Hi - Thanks @Hubert Dudek I will look into disabling access for the users!

5 kudos

11-18-2022 6:56:48 AM

1 More Replies

by William_Scardua • Valued Contributor

10-06-2022 1:58:29 PM

7144 Views
7 replies
3 kudos

uuid in Merge

Hi guys,I'm trying to use uuid in the merge but I always get an error...import uuid ( df_events.alias("events").merge( source = df_updates.alias("updates"), condition = "events.cod = updates.cod and events.num = updates.num" ).whenMatch...

Data Engineering

7144 Views
7 replies
3 kudos

10-06-2022 1:58:29 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-16-2022 7:36:15 PM

3 kudos

Hi @William Scardua Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

3 kudos

11-16-2022 7:36:15 PM

6 More Replies