cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Snowflake connector

Phani1
Valued Contributor II

Hi Team, Databricks recommends storing data in a cloud storage location, but if we directly connect to Snowflake using the Snowflake connector, will we face any performance issues?

Could you please suggest the best way to read a large volume of data from Snowflake to Databricks?

2 REPLIES 2

Wojciech_BUK
Valued Contributor III

If you read data from Snowflake in Spark using e.g. spark.read.jdbc it will be slow. This is because the data is loaded in a single step, and is therefore loaded by a single executor.

You need to somehow distributing the query among the spark executors, and assign each executor to read a subset of the result, eg. by adding WHERE conditions or limit-offset clauses, and distribute them among the executors.

JDBC method has also option to supply following information partitionColumn, lowerBound, upperBound, and numPartitions , then you can paralelize it.

Another way is to sync data into DeltaLake and run your query against delta table.

-------
UPDATE: there is also one more way but would require redesign on Snoflake end -> to create table in Snowflake as External Iceberg table and connect your Databricks job to Iceberg but that might be overkill.

Phani1
Valued Contributor II

Thanks !!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group