cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Snowflake connector

Phani1
Valued Contributor

Hi Team, Databricks recommends storing data in a cloud storage location, but if we directly connect to Snowflake using the Snowflake connector, will we face any performance issues?

Could you please suggest the best way to read a large volume of data from Snowflake to Databricks?

2 REPLIES 2

Wojciech_BUK
Contributor III

If you read data from Snowflake in Spark using e.g. spark.read.jdbc it will be slow. This is because the data is loaded in a single step, and is therefore loaded by a single executor.

You need to somehow distributing the query among the spark executors, and assign each executor to read a subset of the result, eg. by adding WHERE conditions or limit-offset clauses, and distribute them among the executors.

JDBC method has also option to supply following information partitionColumn, lowerBound, upperBound, and numPartitions , then you can paralelize it.

Another way is to sync data into DeltaLake and run your query against delta table.

-------
UPDATE: there is also one more way but would require redesign on Snoflake end -> to create table in Snowflake as External Iceberg table and connect your Databricks job to Iceberg but that might be overkill.

Phani1
Valued Contributor

Thanks !!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.