How to limit batch size from Confluent Kafka

AdamRink · ‎11-28-2022

I have a large stream of data read from Confluent Kafka, 500+ millions of row. When I initialize the stream I cannot control the batch sizes that are read.

I've tried setting options on the readstream - maxBytesPerTrigger, maxOffsetsPerTrigger, fetch.max.bytes, max.poll.records

Configuring spark cluster options maxRatePerPartition

Starting with a fresh checkpoint