Team,
I am struggling with a unique issue. I am not sure if my understanding is wrong or this is a bug with spark.
- I am reading a stream from events hub ( Extract)
- Pivoting and Aggregating the above dataframe ( Transformation). This is a WATERMARKED aggregation.
- writing the aggregation to Delta table in APPEND mode with a Trigger .
However, the most recently published message to event hub is not writing to delta even after falling out of the watermark time.
I am using the same eventtimestamp column in windowing and watermarking.
My understanding is the data should be inserted to the Delta table after "max of Eventtime"(latest message)+Watermark.
This is causing a data loss.
Moreover, all the events in the memory stored must be flushed out to the sink before stopping the stream to mark a graceful shutdown.
Please advise.