cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta live tables running count output mode?

vroste
New Contributor III

I have a DLT with a table that I want to contain the running aggregation (for the sake of simplicitly let's assume it's a count) for each value of some key column, using a session window. The input table goes back several years and to clean up aggregation state, I want to add a watermark. Doing this, however, appears to output no rows.

I believe this is because in the default append output mode, only expired session windows are emitted. Looking at the delta table's history I see appends only. How do I configure update output mode? Or is there another way to achieve my goal?

 

def running_aggregation():    
    return (
        spark.readStream
            .option("withEventTimeOrder", "true")
            .table("LIVE.input_data")
            .withWatermark("created", "365 days") # Watermark in combination with append output mode (don't know how to change for DLT) results in only expired session windows being output..
            .groupBy(session_window("created", "90 days"), "key")            
            .agg(
                count('*')
            )

 

 

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @vroste , 

• To configure the update output mode for running aggregation in Delta Live Tables (DLT), use the outputMode option when writing the DLT table.
• By default, DLT writes data in complete mode, which outputs the complete result table after each trigger.
• To change the output mode to append mode, set the outputMode option to "append" when writing the DLT table.

harvey-c
New Contributor III

Hi, Kaniz

Could you please more details and example on how to configure the outputmode? From public available document table_properties configuration for DLT,  does not have the option for outputMode.  I have also found that sometimes the DLT "decided" to use complete mode instead of append mode, which results in downstream workflow error such as: "streaming tables may only use append-only streaming sources".  Please clarify. Thank you!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.