Michał
New Contributor III

Thank you both for the answers. 

@mmayorga the data is fine, architecture is fine. After thinking about it, despite how I phrased the initial question, what I appear to be missing, is to do something better than all or nothing approach during initial processing of hundreds of millions of rows from a streaming source. 

Regardless what the problem is, when I'm processing for example 400 million rows, and after a few days of processing there is a problem with processing row 399,999,123 I'd like an ability to fix the problem and restart processing, for example from row 399,000,001 rather from the very beginning. Is there a way to do it?