raphaelblg
Databricks Employee
Databricks Employee

Hi @Mathias

I'd say that watermarking might be a good solution for your use case. Please check Control late data threshold with multiple watermark policy in Structured Streaming. 

If you want to dig-in further there's also: Spark Structured Streaming Programming Guide - Handling Late Data and Watermarking.

There are other ways to achieve what you're aiming for, I think it's more of a design decision.

 

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks