Saritha_S
Databricks Employee
Databricks Employee

Hi @Guillermo-HR 

Yes — batch is usually the right fix here.

What’s happening is that your query is using event-time window aggregation in Structured Streaming with append output mode. In that mode, Spark only emits a window after it is sure the window is closed according to the watermark. With monthly files and availableNow=True, the final day’s 1-day window often never gets finalized during that run, so it stays buffered and appears “missing”.

A few key points:

  • withWatermark("datetime", "0 second") does not mean “emit immediately”; it still depends on seeing later event-time data to advance the watermark past the end of the last window.
  • If your last record is on 2026-04-30 23:00, there may be no later event to push the watermark beyond the 2026-04-30 daily window.
  • append + windowed aggregation is therefore a bad fit for bounded monthly backfills/files like this.

Recommendation:

  • Use batch instead of streaming
  • If you want to use streaming, Use outputMode("complete")