Near real time processing with CDC from snowflake to databricks

abelian-grape — Tue, 21 Jan 2025 14:45:31 GMT

Hi I would like to configure near real time streaming on Databricks to process data as soon as a new data finish processing on snowflake e.g. with DLT pipelins and Auto Loader. Which option would be better for this setup?

Option A)

Export the Snowpark DataFrame to Databricks to an external cloud storage (e.g. S3 as parquet).

Option B)

use apache iceberg with polaris and configure from Databricks in order to read that information.

Re: Near real time processing with CDC from snowflake to databricks

saurabh18cs — Tue, 21 Jan 2025 14:36:40 GMT

it is like latency vs complexity and cost. you have to choose for yourself 🙂 for me option A sounds reasonable

topic Re: Near real time processing with CDC from snowflake to databricks in Data Engineering

Near real time processing with CDC from snowflake to databricks

Re: Near real time processing with CDC from snowflake to databricks