Hello to everyone!
I am trying to read delta table as a streaming source using spark. But my microbatches are disbalanced - one very small and the other are very huge. How I can limit this?
I used different configurations with maxBytesPerTrigger and maxFilesPerTrigger, but nothing changes, batch size is always the same.
Are there any ideas?
df = spark \
.readStream \
.format("delta") \
.load("...")
df \
.writeStream \
.outputMode("append") \
.option("checkpointLocation", "...") \
.table("...")
Kind Regards