cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Structured Streaming Delta Table - Reading and writing from same table

sparkrookie
New Contributor II

Hi 

I have a structured streaming job that reads from a delta table "A" and pushes to another delta table "B".

A Schema - group_key, id, timestamp, value
B Schema - group_key, watermark_timestamp, derived_value

One requirement is that i need to get the max watermark_timestamp from "B" for each group (group_key) and then join that with "A" to filter only all those messages for each group than are > each group's watermark_timestamp. After processing those data and updating state, I need to get the max timestamp from those messages and append in B's watermark_timestamp field for each group. Apart from this, i will push some additional data as well in derived_value column to use downstream. 

Basically the above ensures that already processed data does not again come into the stream.

Problem is I am reading from same table as I am writing. When I execute this my job is not succeeding at all when I put B as sink. When I change B to a different table say C then it proceeds.

I tried everything. I tried collect B max group data before the stream even starts. Still not working,

Whats the solution for this? Could someone please help.

Additionally in general if i have a requirement where I need to buffer data for days, I dont want to store everything in memory, apply watermark in arbitary stateful processing and then filter. Whats the best way to solve this problem. I was thinking of using SQL queries which is what my above does.

 

1 REPLY 1

Thanks. Could you please point me to the thread/link which provides a solution for this. I have been blocked for a long time on this and this would really help.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group