Good question. In the nominal scenario the streams come from change data capture records from the data sources. These batches are usually small, with spikes to ~10k records when bulk operations occur at the data source. Nevertheless, even when the batch itself is small, it is joined with much larger tables.

The same streams are also used for (re)initializing the table contents by sending a snapshot of the full table content from the data source. In that scenario, batches can go up to 500M records, for GB of data.