I am performing stream-to-stream join in Databricks using MongoDB as a source (readStream()). Both sources collections receive data at same time. Initially I tried with using watermarks
orderWithWatermark = order \
.selectExpr("order_id AS orderId","event_created AS orderWatermark","approved_date")\
.withWatermark("orderWatermark", "2 minutes")
orderstatusWithWatermark = orderstatus \
.selectExpr("order_id AS orderstatusId","event_created AS orderstatusWatermark","order_date","amount")\
.withWatermark("orderstatusWatermark", "2 minutes")
when i joined them
joined_df=orderWithWatermark.join(
orderstatusWithWatermark,
expr("""
orderId = orderstatusId AND
icb.icbWatermark >= ica.icaWatermark
"""))
and writing the joined_df using writestream to mongodb sink
I am facing an issue
[STREAM_FAILED] Query [id = 37bd39d7-dde1-4d19-8e9f-f4718c27dca4, runId = 735152c6-0398-46aa-b673-a46ac8fa1848] terminated with exception: true SQLSTATE: XXKST I need help is there a recommended way to handle issue ?