Alberto_Umana
Databricks Employee
Databricks Employee

Hello @fperry,

This error occurs because the query contains stateful operations that can emit rows older than the current watermark plus the allowed late record delay. These rows are considered "late rows" in downstream stateful operations and can be discarded. You might need to adjust the watermark duration or the allowed late record delay to accommodate the lateness of your data. This can help prevent the discarding of late rows.

If you understand the risks and still need to run the query, you can disable the correctness check by setting the following configuration:

 

spark.conf.set("spark.sql.streaming.statefulOperator.checkCorrectness.enabled", "false")

However, this should be done with caution as it can lead to potential correctness issues in your streaming application