Stream-stream window join after time window aggregation not working in 13.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-15-2023 12:59 AM
Hey,
I'm trying to perform Time window aggregation in two different streams followed by stream-stream window join described here. I'm running Databricks Runtime 13.1, exactly as advised.
However, when I'm reproducing the following code:
clicksWindow = clicksWithWatermark.groupBy(
clicksWithWatermark.clickAdId,
window(clicksWithWatermark.clickTime, "1 hour")
).count()
impressionsWindow = impressionsWithWatermark.groupBy(
impressionsWithWatermark.impressionAdId,
window(impressionsWithWatermark.impressionTime, "1 hour")
).count()
clicksWindow.join(impressionsWindow, "window", "inner")
I'm not getting any result from the joined table in append mode. It is just empty no matter whether I'm using AdId in groupBy or not. The same behaviour is in Python and Scala.
If I join on window.end, not window, then I start receiving results but then I can use only inner join (as the joined condition, window.end, is not a watermarked column) but I need do to outer join for my use case (even with inner join, state seems to increase indefinitely).
Any help with reproducing this example is appreciated
- Labels:
-
Databricks Runtime
-
Joins
-
Stream
-
Trying

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2023 05:57 AM
Hi @Andrzej Zera
Great to meet you, and thanks for your question!
Let's see if your peers in the community have an answer to your question. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2023 06:05 AM
Hey,
I'm currently facing the same problem, so I would to know if you've made any progress in resolving this issue.

