cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Maintaining Custom State in Structured Streaming

Starki
New Contributor II

I am consuming an IoT stream with thousands of different signals using Structured Streaming. During processing of the stream, I need to know the previous timestamp and value for each signal in the micro batch. The signal stream is eventually written to a delta table. Every signal is expected to be sent at least once every hour.

Is it possible to make use of the internal State Store as a cache to store this custom state of the previous timestamp and value for each signal?

If not, what would be the canonical approach to maintain such a state?

These are the approaches that I can think of.

Approach 1:

Perform a join on the stream with the target table itself to get the previous signal timestamp and value.

Approach 2:

Maintain a separate โ€˜state tableโ€™ containing the previous timestamp and value for each signal. The โ€˜state tableโ€™ would then be joined with the stream to get the previous signal timestamp and value.

On receiving new signal values, the โ€˜state tableโ€™ would be updated using merge into.

1 REPLY 1

Soma
Valued Contributor

@Suteja Kanuriโ€‹ 

Tried the above on streaming DF

But facing the below error

AttributeError: 'DataFrame' object has no attribute 'groupByKey'

Can you please let me know DBR runtime

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.