Databricks Community

vroste · ‎09-07-2023

I have a DLT with a table that I want to contain the running aggregation (for the sake of simplicitly let's assume it's a count) for each value of some key column, using a session window. The input table goes back several years and to clean up aggregation state, I want to add a watermark. Doing this, however, appears to output no rows.

I believe this is because in the default append output mode, only expired session windows are emitted. Looking at the delta table's history I see appends only. How do I configure update output mode? Or is there another way to achieve my goal?

def running_aggregation():    
    return (
        spark.readStream
            .option("withEventTimeOrder", "true")
            .table("LIVE.input_data")
            .withWatermark("created", "365 days") # Watermark in combination with append output mode (don't know how to change for DLT) results in only expired session windows being output..
            .groupBy(session_window("created", "90 days"), "key")            
            .agg(
                count('*')
            )

harvey-c · ‎02-26-2024

Hi, Kaniz

Could you please more details and example on how to configure the outputmode? From public available document table_properties configuration for DLT, does not have the option for outputMode. I have also found that sometimes the DLT "decided" to use complete mode instead of append mode, which results in downstream workflow error such as: "streaming tables may only use append-only streaming sources". Please clarify. Thank you!

Databricks Community

Delta live tables running count output mode?

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐