cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Snowflake connector

Phani1
Valued Contributor II

Hi Team, Databricks recommends storing data in a cloud storage location, but if we directly connect to Snowflake using the Snowflake connector, will we face any performance issues?

Could you please suggest the best way to read a large volume of data from Snowflake to Databricks?

2 REPLIES 2

Wojciech_BUK
Valued Contributor III

If you read data from Snowflake in Spark using e.g. spark.read.jdbc it will be slow. This is because the data is loaded in a single step, and is therefore loaded by a single executor.

You need to somehow distributing the query among the spark executors, and assign each executor to read a subset of the result, eg. by adding WHERE conditions or limit-offset clauses, and distribute them among the executors.

JDBC method has also option to supply following information partitionColumn, lowerBound, upperBound, and numPartitions , then you can paralelize it.

Another way is to sync data into DeltaLake and run your query against delta table.

-------
UPDATE: there is also one more way but would require redesign on Snoflake end -> to create table in Snowflake as External Iceberg table and connect your Databricks job to Iceberg but that might be overkill.

Phani1
Valued Contributor II

Thanks !!