cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Strategy to add new table base on silver data

Joe1912
New Contributor III

I have a merge function for streaming foreachBatch kind of
mergedf(df,i):

    merge_func_1(df,i)

     merge_func_2(df,i)

Then I want to add new merge_func_3 into it. 

Is there any best practices for this case? when streaming always runs, how can I process data from beginning for merge_func_3 without stopping streaming then create another temp job to run for func_3, then run streaming again with adding func_3

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Joe1912When adding a new merge function to a streaming data pipeline, you have a few options. If you want the new function to apply only to recent data, you can modify your existing function to include it. To apply the function to historical data, you can run a batch job to reprocess and update your function. Alternatively, you can use Spark's checkpointing to recover the state and apply the new function to all data. The choice depends on your needs: batch job for simplicity with some data duplication or checkpointing for no duplication and state maintenance.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Joe1912When adding a new merge function to a streaming data pipeline, you have a few options. If you want the new function to apply only to recent data, you can modify your existing function to include it. To apply the function to historical data, you can run a batch job to reprocess and update your function. Alternatively, you can use Spark's checkpointing to recover the state and apply the new function to all data. The choice depends on your needs: batch job for simplicity with some data duplication or checkpointing for no duplication and state maintenance.

Kaniz
Community Manager
Community Manager

Hi @Joe1912 , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.




 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.