cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

First Lakeflow (DLT) Pipeline Best Practice Question

mtreigelman
New Contributor III

Hi, I am writing my first streaming pipeline and trying to ensure it is setup to work as a "Lakeflow" pipeline. 

 

It is connecting an external Oracle database with some external Azure Blob storage data (all managed in the same Unity Catalog). The pipeline is just a simple join, to create a gold level table for analysts to use. 

My first attempt at this looks like: 

%sql
CREATE OR REPLACE STREAMING TABLE final_streaming_table_name
USING DELTA
AS
SELECT *final_cols*
FROM oracle_table_1 o1
  JOIN (SELECT *some_cols* FROM azure_table) a ON a.join_col = o1.join_col_1
  JOIN (SELECT *some_cols* FROM oracle_table_2) o2 ON o2.join_col = o1.join_col_2

But I also saw some other resources saying that I need to specify JOIN STREAM like this: 

%sql
CREATE OR REPLACE STREAMING TABLE final_streaming_table_name
USING DELTA
AS
SELECT *final_cols*
FROM STREAM(azure_table) a
JOIN STREAM(oracle_table_1) o1 ON o1.join_col_1 = a.join_col
JOIN STREAM(oracle_table_2) o2 ON o2.join_col = o1.join_col_2

The external Oracle databases have no regular update schedule, and the underlying values can change at any time. I would always like my `final_streaming_table_name` to have the freshest values. 

Is one way right? Or would they both work, and if both work what are the benefits or downsides to using a particular strategy? 

After experimentation, it appears you cannot do it the first way 😅 Consider this CLOSED. 

1 REPLY 1

BS_THE_ANALYST
Honored Contributor III

@mtreigelmanthanks for providing the update. 

If you wouldn't mind, could you explain why you think the first way didn't work and why the second way did? Then you can mark your response as a solution to the question 🙂.

I found this article to be useful for joins with streaming tables: https://docs.databricks.com/aws/en/transform/join 

BS_THE_ANALYST_0-1756409434209.png

There's some nice info to branch out to on there i.e. Stream-Static joins. 

All the best,
BS

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now