Databricks Community

mtreigelman · ‎08-27-2025

Hi, I am writing my first streaming pipeline and trying to ensure it is setup to work as a "Lakeflow" pipeline.

It is connecting an external Oracle database with some external Azure Blob storage data (all managed in the same Unity Catalog). The pipeline is just a simple join, to create a gold level table for analysts to use.

My first attempt at this looks like:

%sql
CREATE OR REPLACE STREAMING TABLE final_streaming_table_name
USING DELTA
AS
SELECT *final_cols*
FROM oracle_table_1 o1
  JOIN (SELECT *some_cols* FROM azure_table) a ON a.join_col = o1.join_col_1
  JOIN (SELECT *some_cols* FROM oracle_table_2) o2 ON o2.join_col = o1.join_col_2

But I also saw some other resources saying that I need to specify JOIN STREAM like this:

%sql
CREATE OR REPLACE STREAMING TABLE final_streaming_table_name
USING DELTA
AS
SELECT *final_cols*
FROM STREAM(azure_table) a
JOIN STREAM(oracle_table_1) o1 ON o1.join_col_1 = a.join_col
JOIN STREAM(oracle_table_2) o2 ON o2.join_col = o1.join_col_2

The external Oracle databases have no regular update schedule, and the underlying values can change at any time. I would always like my `final_streaming_table_name` to have the freshest values.

~~Is one way right? Or would they both work, and if both work what are the benefits or downsides to using a particular strategy?~~

After experimentation, it appears you cannot do it the first way 😅 Consider this CLOSED.

BS_THE_ANALYST · ‎08-28-2025

@mtreigelmanthanks for providing the update.

If you wouldn't mind, could you explain why you think the first way didn't work and why the second way did? Then you can mark your response as a solution to the question 🙂.

I found this article to be useful for joins with streaming tables: https://docs.databricks.com/aws/en/transform/join

There's some nice info to branch out to on there i.e. Stream-Static joins.

All the best,
BS

Databricks Community

First Lakeflow (DLT) Pipeline Best Practice Question

Join Us as a Local Community Builder!

🎬 Databricks Community 2025 Highlights | A Year, Built Together

🌟 Community Pulse: Your Weekly Roundup! December 22, 2025 – January 04, 2026

Solution Accelerator Series | Scale cybersecurity analytics with Splunk and Databricks

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Self-Paced Learning Festival: 09 January - 30 January 2026