topic Re: How limit input rate reading delta table as stream? in Data Engineering

How limit input rate reading delta table as stream?

Lulka — Tue, 21 Feb 2023 07:55:17 GMT

Hello to everyone!

I am trying to read delta table as a streaming source using spark. But my microbatches are disbalanced - one very small and the other are very huge. How I can limit this?

I used different configurations with maxBytesPerTrigger and maxFilesPerTrigger, but nothing changes, batch size is always the same.

Are there any ideas?

df = spark \

.readStream \

.format("delta") \

.load("...")

df \

.writeStream \

.outputMode("append") \

.option("checkpointLocation", "...") \

.table("...")

Kind Regards

Re: How limit input rate reading delta table as stream?

-werners- — Tue, 21 Feb 2023 12:12:45 GMT

besides the parameters you mention, I don't know of any other which controls the batch size.

did you check if the delta table is not horribly skewed?

Re: How limit input rate reading delta table as stream?

Lulka — Mon, 27 Feb 2023 16:52:53 GMT

Thanks, you are right! Data was very skewed