Databricks(AWS) to snowflake connection

RameshChejarla — Wed, 30 Apr 2025 10:04:38 GMT

Hi everyone,

I have implemented Auto loader working as expected. i need to track the files which are loaded into stage table.

Here is the issue, the file tracking table need to create in snowflake from here i need to track the files.

How to connect databricks and snowflake , pls suggest.

Re: Databricks(AWS) to snowflake connection

Louis_Frolio — Wed, 30 Apr 2025 13:53:27 GMT

To connect Databricks to Snowflake and set up a system for tracking files loaded into your Snowflake stage table, you can use the suggested following approach:

Configure Databricks Snowflake Connector:
Databricks provides a built-in Snowflake connector that integrates with Snowflake. Steps to set it up include:
- Add the Snowflake JDBC driver and Snowflake Spark connector library to your Databricks environment.
- Set the connection options such as sfURL (Snowflake account URL), sfWarehouse (Snowflake warehouse), sfDatabase (database name), sfSchema (schema name), sfRole (optional, Snowflake role), and the authentication credentials (sfUser and sfPassword).
Example connection code in Python: ```python options = { "sfURL": "<Your Snowflake URL>", "sfWarehouse": "<Your Snowflake Warehouse>", "sfDatabase": "<Your Snowflake Database>", "sfSchema": "<Your Snowflake Schema>", "sfRole": "<Optional Your Snowflake Role>", "sfUser": "<Your Username>", "sfPassword": "<Your Password>" }

snowflake_df = spark.read.format("snowflake").options(**options).option("dbtable", "<Your Snowflake Table>").load() ```

Similarly, you can use the .write method to store files into Snowflake once they are processed and loaded by Auto Loader.
Set Up File Tracking Using Auto Loader:
Auto Loader inherently tracks file ingestion progress by maintaining metadata in its checkpoint location. This ensures exactly-once ingestion and fault tolerance when processing files. However, since you wish to track this file metadata in Snowflake:
- You can capture the file metadata (such as file name, timestamp, and status) during ingestion via Auto Loader.
- Write this metadata to a tracking table in Snowflake using the Snowflake connector.
For example, while using Auto Loader: ```python from pyspark.sql.functions import input_file_name

# Load files with Auto Loader df = (spark.readStream .format("cloudFiles") .option("cloudFiles.format", "csv") .load("<Your Cloud Storage Path>") .withColumn("source_file", input_file_name())) # Capture file information

# Write DataFrame to Snowflake (assuming df now has both file records and metadata) df.write.format("snowflake").options(**options).option("dbtable", "<Snowflake Tracking Table>").save() ```
Advantages of Track File Metadata in Snowflake:
By leveraging Snowflake's capabilities as the target system, you can:
- Query file tracking data for auditing purposes.
- Join the tracking table with the actual stage table to ensure that all files are accounted for and processed correctly.

This process combines the incremental ingestion power of Auto Loader in Databricks with the centralized tracking and querying power of Snowflake.

Hope this helps.

Re: Databricks(AWS) to snowflake connection

RameshChejarla — Thu, 01 May 2025 04:24:22 GMT

Thanks for your response , will try and let you know

Re: Databricks(AWS) to snowflake connection

RameshChejarla — Thu, 01 May 2025 15:55:16 GMT

I am trying to connect Snowflake to databricks using Secret scope. I do not have username and password to the snowflake.

Can you pls suggest on this.

topic Re: Databricks(AWS) to snowflake connection in Khoros Community Forums Support (Not for Databricks Product Questions)

Databricks(AWS) to snowflake connection

Re: Databricks(AWS) to snowflake connection

Re: Databricks(AWS) to snowflake connection

Re: Databricks(AWS) to snowflake connection