Ingest data from snowflake to databricks

aharisaibabu · 4 weeks ago

Hi Team,

I have some confusion regarding the best approach for ingesting data from Snowflake into Databricks using custom SQL queries.

While evaluating the available options, I found multiple approaches:

Snowflake Spark Connector
JDBC
Query Federation
Lakeflow Connect (currently in Preview)

From my understanding, the Snowflake Spark Connector appears to provide better performance for data ingestion. However, I noticed some conflicting guidance between the Databricks and Snowflake documentation. Due to the confusion I am looking forward to use JDBC

Databricks documentation states:

"The legacy query federation documentation has been retired and might not be updated. The configurations mentioned in this content are not officially endorsed or tested by Databricks. If Lakehouse Federation supports your source database, Databricks recommends using that instead."

On the other hand, Snowflake documentation indicates that the Snowflake Connector for Spark is supported on Databricks Runtime 4.2 and above.

This has left me with a few questions:

Is the Snowflake Spark Connector still considered a recommended and supported approach for reading data from Snowflake into Databricks?
Are there any known limitations or concerns when using the Snowflake Spark Connector with current Databricks runtimes?
For custom-query-based ingestion from Snowflake, would Databricks recommend using the Snowflake Spark Connector or JDBC?
What is the preferred long-term architecture considering future support and performance?

References:

1. Configuring Snowflake for Spark in Databricks | Snowflake Documentation

2. Read and write data from Snowflake | Databricks on AWS

Any guidance or best practices would be greatly appreciated.

Thanks!