Databricks Community

abelian-grape · ‎03-29-2025

Is it possible to create a streaming table on top of a snowflake tablel accessible via the Lakehouse Federation?

Brahmareddy · ‎03-29-2025

Hi abelian-grape,

How are you doing today?, As per my understanding, Right now, it’s not possible to create a streaming table directly on top of a Snowflake table that’s accessed through Lakehouse Federation in Databricks. Lakehouse Federation allows you to run read-only queries on external databases like Snowflake using Unity Catalog, but it doesn’t support streaming or write operations. So if you need streaming, the best approach is to ingest the data from Snowflake into a Delta table first, and then create a streaming pipeline on top of that Delta table. You could use scheduled batch jobs or tools like Fivetran, dbt, or custom logic with JDBC to pull data regularly from Snowflake into your Bronze layer, and from there, build your streaming logic. Let me know if you want help setting that up!

Regards,

Brahma

abelian-grape · ‎04-04-2025

Hey @Brahmareddy, I ended up creating a Delta table as a mirror of the source Snowflake table (accessed via Lakehouse Federation). I set up logic to append only new records to the Delta table based on a timestamp column—so only records where the timestamp is greater than the current max get added

Then I use readStream in append mode to write those new records to a staging Delta table. The downstream process picks up from this staging table—so for example, it processes new items like 3, 4, 5—and then I delete the processed records from the staging table to ensure only new data gets handled incrementally.

What do you think of this approach? Am I overcomplicating it?

Brahmareddy · ‎04-05-2025

Hi abelian-grape,

Honestly, that’s a really solid and practical approach—you’re not overcomplicating it at all. Since Lakehouse Federation doesn't support streaming or change tracking directly, creating a mirror Delta table that pulls new records based on a timestamp is a great workaround. Using readStream in append mode on that Delta table gives you streaming-like behavior, and having a staging table to feed the downstream process and then clean up processed records keeps things tidy and avoids reprocessing. It might feel a bit manual, but this pattern gives you control, reliability, and flexibility, especially when true streaming isn’t available. If it works well for your volume and latency needs, I’d say you’re on the right track! Let me know if you’d like ideas on automating the cleanup or tracking offsets in a more reusable way.

Regards,

Brahma