cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Left Outer Join returns an Inner Join in Delta Live Tables

dofre
New Contributor II

In our Delta Live Table pipeline I am simply joining two streaming tables to a new streaming table.

We use the following code:

 

@Dlt.create_table()
def fact_event_faults():
    events = dlt.read_stream('event_list').withWatermark('TimeStamp', '4 hours')
    files = dlt.read_stream('file_list').withWatermark('Trigger_DateTime', '4 hours')
    event_faults = events.join(files, 
      (
        (events.TimeStamp >= (files.Trigger_DateTime - f.expr('INTERVAL 1 SECONDS'))) &
        (events.TimeStamp <= (files.Trigger_DateTime + f.expr('INTERVAL 1 SECONDS'))) &
        (events.Unique_Installation_ID == files.Unique_Installation_ID)
      ),
      how='leftouter')
    
    return event_faults

 

When running this, the event_list table has 12K records, the file_list has 4000 records but the event_faults table only has 76 records.
4bba99f9-1293-42f9-ab53-f869af77a877.jpg

In an left outer join I would expect at least 12K records in the resulting table.

Executing the same query in a notebook without streaming and without watermarking, returns the expected 12K+ rows.

What am I missing here?

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now