Very large binary files ingestion error when using binaryFile reader

Get Started Discussions

Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.

Hello, I am facing an error while trying to read a large binary file (rosbag format) using binaryFile reader. The file I am trying to read is approx 7GB large. Here's the error message I am getting:

FileReadException: Error while reading file dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag. Caused by: SparkException: The length of dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag is 7156086862, which exceeds the max length allowed: 2147483647.

Here's the code:

BINARY_FILES_SCHEMA = StructType(

[

StructField("path", StringType()),

StructField("modificationTime", TimestampType()),

StructField("length", LongType()),

StructField("content", BinaryType()),

]

)

binary_df = spark.read.format("binaryFile").schema(BINARY_FILES_SCHEMA).load("/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag")

binary_df.printSchema()

display(binary_df)