Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes).

shan_chandra — Mon, 11 Oct 2021 16:46:16 GMT

I got the below error when running a streaming workload from a source Delta table

Caused by: java.lang.RuntimeException: Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes). As a workaround, you can reduce the vectorized reader batch size, or disable the vectorized reader, or disable spark.sql.sources.bucketing.enabled if you read from bucket table. For Parquet file format, refer to spark.sql.parquet.columnarReaderBatchSize (default 4096) and spark.sql.parquet.enableVectorizedReader; for ORC file format, refer to spark.sql.orc.columnarReaderBatchSize (default 4096) and spark.sql.orc.enableVectorizedReader

could you please let us know how to mitigate the issue?

Re: Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes).

shan_chandra — Mon, 11 Oct 2021 16:49:57 GMT

This is happening because the delta/parquet source has one or more of the following:

a huge number of columns
huge strings in one or more columns
huge arrays/map, possibly nested in each other

In order to mitigate this issue, could you please reduce spark.sql.parquet.columnarReaderBatchSize from default value - 4096 ?

topic Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes). in Data Engineering

Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes).

Re: Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes).