how to process a streaming lakeflow declarative pi...

Michał · ‎09-03-2025

Hi,

I've got a problem and I have run out of ideas as to what else I can try. Maybe you can help?

I've got a delta table with hundreds millions of records on which I have to perform relatively expensive operations. I'd like to be able to process some of the records, stop the process, then restart it from where it left - the usual streaming thing. My problem is, that it appears that the first time the pipeline runs, it has to process successfully all of the records for any of the outputs to be persisted in the target table.

I tried setting spark options, limiting max number of files to read, max data to read, always with the same behaviour - that all or nothing processing on the first run.

Could you point me to a reliable resource documenting how to control batch size of lakeflow declarative pipelines?

how to process a streaming lakeflow declarative pipeline in batches