Data Engineering

Forum Posts

Sorted by:

by ajbush • New Contributor III

01-26-2023 5:33:23 PM

20030 Views
8 replies
3 kudos

Connecting to Snowflake using an SSO user from Azure Databricks

Hi all,I'm just reaching out to see if anyone has information or can point me in a useful direction. I need to connect to Snowflake from Azure Databricks using the connector: https://learn.microsoft.com/en-us/azure/databricks/external-data/snowflakeT...

Data Engineering

20030 Views
8 replies
3 kudos

01-26-2023 5:33:23 PM

View Replies

Latest Reply

BobGeor_68322
New Contributor III

08-30-2024 5:44:16 AM

3 kudos

we ended up using device flow oauth because, as noted above, it is not possible to launch a browser on the Databricks cluster from a notebook so you cannot use "externalBrowser" flow. It gives you a url and a code and you open the url in a new tab an...

3 kudos

08-30-2024 5:44:16 AM

7 More Replies

by Madman • New Contributor II

08-07-2021 9:55:06 AM

14290 Views
5 replies
6 kudos

Snowflake connection to Databricks error

When I am trying to read snowflake table from my databricks notebook, it is giving the error as:df1.read.format("snowflake") \.options(**options) \.option("query", "select * from abc") \.save()Getting below errorjava.sql.SQLException: No suitable dri...

Data Engineering

14290 Views
5 replies
6 kudos

08-07-2021 9:55:06 AM

View Replies

Latest Reply

pdiegop
New Contributor II

08-22-2023 3:13:13 AM

6 kudos

@anurag2192 did you managed to solve it?

6 kudos

08-22-2023 3:13:13 AM

4 More Replies

by alexisjohnson • New Contributor III

11-18-2021 10:46:17 AM

13143 Views
5 replies
7 kudos

Resolved! Window function using last/last_value with PARTITION BY/ORDER BY has unexpected results

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:select col1, col2, last_value(col2) over (partition by col1 order by col2) as column2_last from values ...

Data Engineering

13143 Views
5 replies
7 kudos

11-18-2021 10:46:17 AM

View Replies

Latest Reply

Carv
New Contributor II

07-12-2023 1:00:31 PM

7 kudos

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.I understand l...

7 kudos

07-12-2023 1:00:31 PM

4 More Replies

by Khalil • Contributor

05-18-2023 10:13:21 PM

2288 Views
0 replies
0 kudos

Snowpark vs Spark on Databricks

Why / When should we choose Spark on Databricks over Snowpark if the data we are processing is underlying in Snowflake?

Data Engineering

2288 Views
0 replies
0 kudos

05-18-2023 10:13:21 PM

by pvignesh92 • Honored Contributor

03-10-2023 12:26:51 AM

7961 Views
6 replies
2 kudos

Resolved! Optimizing Writes from Databricks to Snowflake

My job after doing all the processing in Databricks layer writes the final output to Snowflake tables using df.write API and using Spark snowflake connector. I often see that even a small dataset (16 partitions and 20k rows in each partition) takes a...

Data Engineering

7961 Views
6 replies
2 kudos

03-10-2023 12:26:51 AM

View Replies

Latest Reply

pvignesh92
Honored Contributor

04-03-2023 4:32:26 AM

2 kudos

There are few options I tried out which had given me a better performance.Caching the intermediate or final results so that while writing the dataframe computation does not repeat again. Coalesce the results into the partitions 1x or 0.5x your number...

2 kudos

04-03-2023 4:32:26 AM

5 More Replies

by pvignesh92 • Honored Contributor

03-10-2023 12:32:38 AM

16487 Views
8 replies
0 kudos

Resolved! Multi Statement Writes from Spark to Snowflake

Does Spark support multi statement writes to Snowflake in a single session? To elaborate, I have a requirement where I need to do A selective deletion of data from a Snowflake table and Insert records to Snowflake table ( Ranges from around 1 M rows)...

Data Engineering

16487 Views
8 replies
0 kudos

03-10-2023 12:32:38 AM

View Replies

Latest Reply

pvignesh92
Honored Contributor

03-21-2023 5:16:54 AM

0 kudos

In my analysis, I got the below understanding If your data is sitting in Snowflake and you have a set of DDL/DML queries that need to wrapped into a single transaction, you can use MULTI_STATEMENT option to 0 and use snowflake utils runQuery method t...

0 kudos

03-21-2023 5:16:54 AM

7 More Replies

by hamzatazib96 • New Contributor III

07-05-2022 9:54:01 AM

2576 Views
1 replies
1 kudos

Snowflake/GCP error: Premature end of chunk coded message body: closing chunk expected

Hello all,I've been experiencing the error described below, where I try to query a table from Snowflake which is about ~5.5B rows and ~30columns, and it fails almost systematically; specifically, either the Spark Job doesn't even start or I get the ...

Data Engineering

2576 Views
1 replies
1 kudos

07-05-2022 9:54:01 AM

View Replies

Latest Reply

Vidula
Honored Contributor

08-31-2022 11:30:52 PM

1 kudos

Hey there @hamzatazib96 Does @Kaniz Fatma response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

1 kudos

08-31-2022 11:30:52 PM

by NicolasEscobar • New Contributor II

09-30-2021 3:54:36 AM

9895 Views
7 replies
5 kudos

Resolved! Job fails after runtime upgrade

I have a job running with no issues in Databricks runtime 7.3 LTS. When I upgraded to 8.3 it fails with error An exception was thrown from a UDF: 'pyspark.serializers.SerializationError'... SparkContext should only be created and accessed on the driv...

Data Engineering

9895 Views
7 replies
5 kudos

09-30-2021 3:54:36 AM

View Replies

Latest Reply

User16873042682
New Contributor II

03-01-2022 3:33:43 AM

5 kudos

Adding to @Sean Owen comments, The only reason this is working is that the optimizer is evaluating this locally rather than creating a context on executors and evaluating it.

5 kudos

03-01-2022 3:33:43 AM

6 More Replies

by sgannavaram • New Contributor III

03-06-2022 9:27:22 PM

10397 Views
6 replies
4 kudos

Resolved! How to get the last time ( previous ) databricks job run time?

How to get the last databricks job run time? I have a requirement where i need to pass last job runtime as an argument in SQL and this SQL get the records from snowflake database based on this timestamp.

Data Engineering

10397 Views
6 replies
4 kudos

03-06-2022 9:27:22 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:08:18 AM

4 kudos

Hey there @Srinivas Gannavaram Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members.Cheers!

4 kudos

04-22-2022 7:08:18 AM

5 More Replies

by SajiD • New Contributor

01-18-2022 4:52:44 AM

1365 Views
0 replies
0 kudos

Snowflake Connector for Databricks

Hi everyone, I am working with Databricks Notebooks and I am facing an issue with snowflake connector, I wanted to use DDL/DML with snowflake connector. Can someone please help me out with this, Thanks in advance !!

Data Engineering

1365 Views
0 replies
0 kudos

01-18-2022 4:52:44 AM

by Sam • New Contributor III

12-02-2021 3:53:18 PM

1431 Views
1 replies
4 kudos

collect_set/ collect_list Pushdown

Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?

Data Engineering

1431 Views
1 replies
4 kudos

12-02-2021 3:53:18 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-02-2021 11:29:53 PM

4 kudos

Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled

4 kudos

12-02-2021 11:29:53 PM

by marchello • New Contributor III

09-04-2021 12:46:37 AM

6725 Views
8 replies
3 kudos

Resolved! error on connecting to Snowflake

Hi team, I'm getting weird error in one of my jobs when connecting to Snowflake. All my other jobs (I've got plenty) work fine. The current one also works fine when I have only one coding step (except installing needed libraries in my very first step...

Data Engineering

6725 Views
8 replies
3 kudos

09-04-2021 12:46:37 AM

View Replies

Latest Reply

Dan_Z
Databricks Employee

10-11-2021 2:18:01 PM

3 kudos

@marchello I suggest you contact Snowflake to move forward on this one.

3 kudos

10-11-2021 2:18:01 PM

7 More Replies

by Sam • New Contributor III

09-13-2021 4:24:14 PM

3940 Views
1 replies
1 kudos

Resolved! Query Pushdown in Snowflake

Hi,I am wondering what documentation exists on Query Pushdown in Snowflake.I noticed that a single function (monitonically_increasing_id()) prevented the entire query being pushed down to Snowflake during an ETL process. Is Pushdown coming from the S...

Data Engineering

3940 Views
1 replies
1 kudos

09-13-2021 4:24:14 PM

View Replies

Latest Reply

siddhathPanchal
Databricks Employee

10-11-2021 9:18:18 AM

1 kudos

Hi Sam,The Spark Connector applies predicate and query pushdown by capturing and analyzing the Spark logical plans for SQL operations. When the data source is Snowflake, the operations are translated into a SQL query and then executed in Snowflake to...

1 kudos

10-11-2021 9:18:18 AM

by User16790091296 • Contributor II

06-24-2021 8:26:41 AM

1278 Views
1 replies
1 kudos

How to connect Databricks to Snowflake using Python?

Data Engineering

1278 Views
1 replies
1 kudos

06-24-2021 8:26:41 AM

View Replies

Latest Reply

Ryan_Chynoweth
Esteemed Contributor

06-24-2021 8:32:24 AM

1 kudos

The open source spark connector for Snowflake is available by default in the Databricks runtime. To connect you can use the following code: # Use secrets DBUtil to get Snowflake credentials. user = dbutils.secrets.get("<scope>", "<secret key>") passw...

1 kudos

06-24-2021 8:32:24 AM

by Anonymous • Not applicable

06-02-2021 5:01:52 PM

1107 Views
0 replies
0 kudos

Append subset of columns to target Snowflake table

I’m using the databricks-snowflake connector to load data into a Snowflake table. Can someone point me to any example of how we can append only a subset of columns to a target Snowflake table (for example some columns in the target snowflake table ar...

Data Engineering

1107 Views
0 replies
0 kudos

06-02-2021 5:01:52 PM