Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi all,I'm just reaching out to see if anyone has information or can point me in a useful direction. I need to connect to Snowflake from Azure Databricks using the connector: https://learn.microsoft.com/en-us/azure/databricks/external-data/snowflakeT...
we ended up using device flow oauth because, as noted above, it is not possible to launch a browser on the Databricks cluster from a notebook so you cannot use "externalBrowser" flow. It gives you a url and a code and you open the url in a new tab an...
When I am trying to read snowflake table from my databricks notebook, it is giving the error as:df1.read.format("snowflake") \.options(**options) \.option("query", "select * from abc") \.save()Getting below errorjava.sql.SQLException: No suitable dri...
Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:select
col1,
col2,
last_value(col2) over (partition by col1 order by col2) as column2_last
from values
...
For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.I understand l...
My job after doing all the processing in Databricks layer writes the final output to Snowflake tables using df.write API and using Spark snowflake connector. I often see that even a small dataset (16 partitions and 20k rows in each partition) takes a...
There are few options I tried out which had given me a better performance.Caching the intermediate or final results so that while writing the dataframe computation does not repeat again. Coalesce the results into the partitions 1x or 0.5x your number...
Does Spark support multi statement writes to Snowflake in a single session? To elaborate, I have a requirement where I need to do A selective deletion of data from a Snowflake table and Insert records to Snowflake table ( Ranges from around 1 M rows)...
In my analysis, I got the below understanding If your data is sitting in Snowflake and you have a set of DDL/DML queries that need to wrapped into a single transaction, you can use MULTI_STATEMENT option to 0 and use snowflake utils runQuery method t...
Hello all,I've been experiencing the error described below, where I try to query a table from Snowflake which is about ~5.5B rows and ~30columns, and it fails almost systematically; specifically, either the Spark Job doesn't even start or I get the ...
Hey there @hamzatazib96 Does @Kaniz Fatma response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!
I have a job running with no issues in Databricks runtime 7.3 LTS. When I upgraded to 8.3 it fails with error An exception was thrown from a UDF: 'pyspark.serializers.SerializationError'... SparkContext should only be created and accessed on the driv...
Adding to @Sean Owen comments, The only reason this is working is that the optimizer is evaluating this locally rather than creating a context on executors and evaluating it.
How to get the last databricks job run time? I have a requirement where i need to pass last job runtime as an argument in SQL and this SQL get the records from snowflake database based on this timestamp.
Hey there @Srinivas Gannavaram Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members.Cheers!
Hi everyone, I am working with Databricks Notebooks and I am facing an issue with snowflake connector, I wanted to use DDL/DML with snowflake connector. Can someone please help me out with this, Thanks in advance !!
Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?
Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled
Hi team, I'm getting weird error in one of my jobs when connecting to Snowflake. All my other jobs (I've got plenty) work fine. The current one also works fine when I have only one coding step (except installing needed libraries in my very first step...
Hi,I am wondering what documentation exists on Query Pushdown in Snowflake.I noticed that a single function (monitonically_increasing_id()) prevented the entire query being pushed down to Snowflake during an ETL process. Is Pushdown coming from the S...
Hi Sam,The Spark Connector applies predicate and query pushdown by capturing and analyzing the Spark logical plans for SQL operations. When the data source is Snowflake, the operations are translated into a SQL query and then executed in Snowflake to...
The open source spark connector for Snowflake is available by default in the Databricks runtime. To connect you can use the following code: # Use secrets DBUtil to get Snowflake credentials.
user = dbutils.secrets.get("<scope>", "<secret key>")
passw...
I’m using the databricks-snowflake connector to load data into a Snowflake table. Can someone point me to any example of how we can append only a subset of columns to a target Snowflake table (for example some columns in the target snowflake table ar...