I have data pipeline which is running continuously, processes the micro batch data and store data in delta lake. This is taking care of any new data.
But at times, I need to process historical data without disturbing real time data processing.
Is there any suggested approach for this scenario. Appreciate any help.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.