cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ajbush
by New Contributor III
  • 17026 Views
  • 8 replies
  • 2 kudos

Connecting to Snowflake using an SSO user from Azure Databricks

Hi all,I'm just reaching out to see if anyone has information or can point me in a useful direction. I need to connect to Snowflake from Azure Databricks using the connector: https://learn.microsoft.com/en-us/azure/databricks/external-data/snowflakeT...

  • 17026 Views
  • 8 replies
  • 2 kudos
Latest Reply
BobGeor_68322
New Contributor II
  • 2 kudos

we ended up using device flow oauth because, as noted above, it is not possible to launch a browser on the Databricks cluster from a notebook so you cannot use "externalBrowser" flow. It gives you a url and a code and you open the url in a new tab an...

  • 2 kudos
7 More Replies
Madman
by New Contributor II
  • 13472 Views
  • 5 replies
  • 6 kudos

Snowflake connection to Databricks error

When I am trying to read snowflake table from my databricks notebook, it is giving the error as:df1.read.format("snowflake") \.options(**options) \.option("query", "select * from abc") \.save()Getting below errorjava.sql.SQLException: No suitable dri...

  • 13472 Views
  • 5 replies
  • 6 kudos
Latest Reply
pdiegop
New Contributor II
  • 6 kudos

@anurag2192 did you managed to solve it?

  • 6 kudos
4 More Replies
alexisjohnson
by New Contributor III
  • 11149 Views
  • 5 replies
  • 6 kudos

Resolved! Window function using last/last_value with PARTITION BY/ORDER BY has unexpected results

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:select col1, col2, last_value(col2) over (partition by col1 order by col2) as column2_last from values ...

Screen Shot 2021-11-18 at 12.48.25 PM Screen Shot 2021-11-18 at 12.48.32 PM
  • 11149 Views
  • 5 replies
  • 6 kudos
Latest Reply
Carv
Visitor II
  • 6 kudos

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.I understand l...

  • 6 kudos
4 More Replies
Khalil
by Contributor
  • 2108 Views
  • 0 replies
  • 0 kudos

Snowpark vs Spark on Databricks

Why / When should we choose Spark on Databricks over Snowpark if the data we are processing is underlying in Snowflake?

  • 2108 Views
  • 0 replies
  • 0 kudos
pvignesh92
by Honored Contributor
  • 7469 Views
  • 6 replies
  • 2 kudos

Resolved! Optimizing Writes from Databricks to Snowflake

My job after doing all the processing in Databricks layer writes the final output to Snowflake tables using df.write API and using Spark snowflake connector. I often see that even a small dataset (16 partitions and 20k rows in each partition) takes a...

  • 7469 Views
  • 6 replies
  • 2 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 2 kudos

There are few options I tried out which had given me a better performance.Caching the intermediate or final results so that while writing the dataframe computation does not repeat again. Coalesce the results into the partitions 1x or 0.5x your number...

  • 2 kudos
5 More Replies
pvignesh92
by Honored Contributor
  • 15781 Views
  • 8 replies
  • 0 kudos

Resolved! Multi Statement Writes from Spark to Snowflake

Does Spark support multi statement writes to Snowflake in a single session? To elaborate, I have a requirement where I need to do A selective deletion of data from a Snowflake table and Insert records to Snowflake table ( Ranges from around 1 M rows)...

  • 15781 Views
  • 8 replies
  • 0 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 0 kudos

In my analysis, I got the below understanding If your data is sitting in Snowflake and you have a set of DDL/DML queries that need to wrapped into a single transaction, you can use MULTI_STATEMENT option to 0 and use snowflake utils runQuery method t...

  • 0 kudos
7 More Replies
hamzatazib96
by New Contributor III
  • 2225 Views
  • 1 replies
  • 1 kudos

Snowflake/GCP error: Premature end of chunk coded message body: closing chunk expected

Hello all,I've been experiencing the error described below, where I try to query a table from Snowflake which is about ~5.5B rows and ~30columns, and it fails almost systematically; specifically, either the Spark Job doesn't even start or I get the ...

  • 2225 Views
  • 1 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hey there @hamzatazib96​ Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
NicolasEscobar
by New Contributor II
  • 9296 Views
  • 7 replies
  • 5 kudos

Resolved! Job fails after runtime upgrade

I have a job running with no issues in Databricks runtime 7.3 LTS. When I upgraded to 8.3 it fails with error An exception was thrown from a UDF: 'pyspark.serializers.SerializationError'... SparkContext should only be created and accessed on the driv...

  • 9296 Views
  • 7 replies
  • 5 kudos
Latest Reply
User16873042682
New Contributor II
  • 5 kudos

Adding to @Sean Owen​  comments, The only reason this is working is that the optimizer is evaluating this locally rather than creating a context on executors and evaluating it.

  • 5 kudos
6 More Replies
sgannavaram
by New Contributor III
  • 9343 Views
  • 6 replies
  • 4 kudos

Resolved! How to get the last time ( previous ) databricks job run time?

How to get the last databricks job run time? I have a requirement where i need to pass last job runtime as an argument in SQL and this SQL get the records from snowflake database based on this timestamp.  

  • 9343 Views
  • 6 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Srinivas Gannavaram​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members.Cheers!

  • 4 kudos
5 More Replies
SajiD
by New Contributor
  • 1259 Views
  • 0 replies
  • 0 kudos

Snowflake Connector for Databricks

Hi everyone, I am working with Databricks Notebooks and I am facing an issue with snowflake connector, I wanted to use DDL/DML with snowflake connector. Can someone please help me out with this, Thanks in advance !!

  • 1259 Views
  • 0 replies
  • 0 kudos
Sam
by New Contributor III
  • 1279 Views
  • 1 replies
  • 4 kudos

collect_set/ collect_list Pushdown

Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?

  • 1279 Views
  • 1 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled

  • 4 kudos
marchello
by New Contributor III
  • 6108 Views
  • 8 replies
  • 3 kudos

Resolved! error on connecting to Snowflake

Hi team, I'm getting weird error in one of my jobs when connecting to Snowflake. All my other jobs (I've got plenty) work fine. The current one also works fine when I have only one coding step (except installing needed libraries in my very first step...

  • 6108 Views
  • 8 replies
  • 3 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 3 kudos

@marchello​ I suggest you contact Snowflake to move forward on this one.

  • 3 kudos
7 More Replies
Sam
by New Contributor III
  • 3587 Views
  • 1 replies
  • 1 kudos

Resolved! Query Pushdown in Snowflake

Hi,I am wondering what documentation exists on Query Pushdown in Snowflake.I noticed that a single function (monitonically_increasing_id()) prevented the entire query being pushed down to Snowflake during an ETL process. Is Pushdown coming from the S...

  • 3587 Views
  • 1 replies
  • 1 kudos
Latest Reply
siddhathPanchal
Databricks Employee
  • 1 kudos

Hi Sam,The Spark Connector applies predicate and query pushdown by capturing and analyzing the Spark logical plans for SQL operations. When the data source is Snowflake, the operations are translated into a SQL query and then executed in Snowflake to...

  • 1 kudos
User16790091296
by Contributor II
  • 1067 Views
  • 1 replies
  • 1 kudos
  • 1067 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 1 kudos

The open source spark connector for Snowflake is available by default in the Databricks runtime. To connect you can use the following code: # Use secrets DBUtil to get Snowflake credentials. user = dbutils.secrets.get("<scope>", "<secret key>") passw...

  • 1 kudos
Anonymous
by Not applicable
  • 947 Views
  • 0 replies
  • 0 kudos

Append subset of columns to target Snowflake table

I’m using the databricks-snowflake connector to load data into a Snowflake table. Can someone point me to any example of how we can append only a subset of columns to a target Snowflake table (for example some columns in the target snowflake table ar...

  • 947 Views
  • 0 replies
  • 0 kudos
Labels