cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Left Outer Join returns an Inner Join in Delta Live Tables

dofre
New Contributor II

In our Delta Live Table pipeline I am simply joining two streaming tables to a new streaming table.

We use the following code:

 

@Dlt.create_table()
def fact_event_faults():
    events = dlt.read_stream('event_list').withWatermark('TimeStamp', '4 hours')
    files = dlt.read_stream('file_list').withWatermark('Trigger_DateTime', '4 hours')
    event_faults = events.join(files, 
      (
        (events.TimeStamp >= (files.Trigger_DateTime - f.expr('INTERVAL 1 SECONDS'))) &
        (events.TimeStamp <= (files.Trigger_DateTime + f.expr('INTERVAL 1 SECONDS'))) &
        (events.Unique_Installation_ID == files.Unique_Installation_ID)
      ),
      how='leftouter')
    
    return event_faults

 

When running this, the event_list table has 12K records, the file_list has 4000 records but the event_faults table only has 76 records.
4bba99f9-1293-42f9-ab53-f869af77a877.jpg

In an left outer join I would expect at least 12K records in the resulting table.

Executing the same query in a notebook without streaming and without watermarking, returns the expected 12K+ rows.

What am I missing here?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group