Hey,
I'm trying to perform Time window aggregation in two different streams followed by stream-stream window join described here. I'm running Databricks Runtime 13.1, exactly as advised.
However, when I'm reproducing the following code:
clicksWindow = clicksWithWatermark.groupBy(
clicksWithWatermark.clickAdId,
window(clicksWithWatermark.clickTime, "1 hour")
).count()
impressionsWindow = impressionsWithWatermark.groupBy(
impressionsWithWatermark.impressionAdId,
window(impressionsWithWatermark.impressionTime, "1 hour")
).count()
clicksWindow.join(impressionsWindow, "window", "inner")
I'm not getting any result from the joined table in append mode. It is just empty no matter whether I'm using AdId in groupBy or not. The same behaviour is in Python and Scala.
If I join on window.end, not window, then I start receiving results but then I can use only inner join (as the joined condition, window.end, is not a watermarked column) but I need do to outer join for my use case (even with inner join, state seems to increase indefinitely).
Any help with reproducing this example is appreciated