- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2023 12:32 AM
Does Spark support multi statement writes to Snowflake in a single session? To elaborate, I have a requirement where I need to do
- A selective deletion of data from a Snowflake table and
- Insert records to Snowflake table ( Ranges from around 1 M rows)
I want to wrap these both into a single transaction so that I could get Consistency here. I can't use overwrite mode as it will overwrite my entire table in Snowflake. I have seen runQuery of sfutils support multiStatements but it is more to handle data within Snowflake and not from Databricks to Snowflake.
Thanks,
Vignesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2023 02:53 AM
are you looking for something like a merge (upsert) or a replaceWhere?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2023 04:16 AM
Yes. It's kind of replace where. My input tables are in Databricks layer. So I join them and write to a dataframe. Then I want to write this dataframe to Snowflake table by selectively deleting certain data as I have separate pipeline for different regions. So my overwrite on Snowflake table should only be for that region.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2023 04:40 AM
The only way this can be done in a single command is if snowflake has something like a replaceWhere. I did some searching but did not find anything.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2023 04:45 AM
Yes. I have tried the DELETE command using sfutils runQuery and then my data insertion using df.write method. They are captured in different sessions with two different session id in Snowflake. I basically want to wrap these two statements as a single transaction
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2023 05:15 AM
if snowflake does not have the possibility, it is not possible in spark either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2023 04:40 AM
Hi @Vigneshraja Palaniraj , Could you please check the below document if this meets your use case
https://docs.databricks.com/sql/language-manual/delta-merge-into.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2023 04:42 AM
Hi @Lakshay Goel . Thanks for your response. This option is more about when you want to update your delta table that is existing in Databricks layer itself. But I'm looking for the option when I want to write to Snowflake with dataframe write API using Spark snowflake connector
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-21-2023 05:16 AM
In my analysis, I got the below understanding
- If your data is sitting in Snowflake and you have a set of DDL/DML queries that need to wrapped into a single transaction, you can use MULTI_STATEMENT option to 0 and use snowflake utils runQuery method to run them as single transaction. All these command will be executed in Snowflake with a single session id
- If the requirement is to write a dataframe computed from Databricks to Snowflake along with DDL command, then they will be executed as two different transactions with two different session ids. So not really possible out of the box to wrap them to a single session.