cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Slow read using Snowflake connector in Databricks.

peta
New Contributor II

Hi,

I am trying to read table from Snowflake with Databricks native Snowflake jdbc connector. It is going well for small amount of data (100 rows), but if I am adding more (even just 1000 rows) the query does not finish. I was checking if the query finishes in Snowflake part and is takes just few miliseconds. Could you please advice where could be the problem?

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @peta , 

When you encounter a situation where your query performs well with a small amount of data but slows down when dealing with larger datasets, there could be several factors at play.

Here are some common causes for this issue and how you can address them:

  1. Network Latency: If you're transferring data from Snowflake to Databricks over a network, network latency can be a significant factor affecting query performance. Slow network connections, due to either latency or limited bandwidth, can lead to increased query response times.

    To tackle this problem, you might want to optimize your network's throughput or consider replicating the data to Databricks storage, which can result in faster data retrieval.

  2. Snowflake Warehouse Size and Concurrency: The size of your Snowflake warehouse may not be sufficient to handle the required compute resources for the data size or may be affected by high concurrency from other users. Snowflake has policies in place that can limit queries under these conditions.

    To mitigate this issue, you can think about scaling up the size of your Snowflake warehouse to enhance performance. Additionally, reviewing and optimizing your query, especially if it contains suboptimal SQL constructs, can help.

  3. JDBC Driver: The use of an outdated version of the Snowflake JDBC driver or suboptimal driver configurations can also lead to slow query performance.

    To resolve this, ensure you're using the latest version of the Snowflake JDBC driver. You can also fine-tune driver configurations, such as setting JDBC Driver String Parameters to encrypt data between Snowflake and Databricks.

  4. Data Parsing and Processing: Large data transfers can impact data processing efficiency, particularly when data is parsed and structured for in-memory processing. The communication between Snowflake and Databricks occurs via a JDBC connection, which involves some overhead in parsing CSV data into Spark dataframe objects.

    To optimize the processing, consider using a more efficient data format like Apache Parquet. Additionally, you can enhance data processing by defining custom schemas for Spark dataframes, which can significantly improve data processing efficiency.

By addressing these issues, you can work toward improving the performance of your Snowflake queries in Databricks.

 
Is this conversation helpful so far?

peta
New Contributor II

Hi @Kaniz ,

Thank you for your reply.

1. Network should not be a problem, the communication is between 2 Azure subscriptions, but I am checking it with our Operation team just to be sure.

2.We are using the smallest Snowflake warehouse (XS) and it seems working ok, all the queries I am sending there through JDBC ends with Success on Snowlake side but in Databricks we do not get the data. Futhermore I am running there really small queries, do you think that for the communication would be bigger warehouse better?

3. We are now using snowflake-jdbc 3.13.29 which should be the latest one. Which connection parameters do you thing can be tuned for better performace?

4. How can I use parquet files instead of csv through JDBC?

Thanks.

Petra 

sri123
New Contributor II

Hi @peta 

Have your issue got resolved, I'm facing the similar issue. If your issue got resolved can you please post the steps that you followed to resolve your issue?

Thanks & Regards

Sri