Hello, I am facing an error while trying to read a large binary file (rosbag format) using binaryFile reader. The file I am trying to read is approx 7GB large. Here's the error message I am getting:
FileReadException: Error while reading file dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag. Caused by: SparkException: The length of dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag is 7156086862, which exceeds the max length allowed: 2147483647.
Here's the code:
BINARY_FILES_SCHEMA = StructType(
[
StructField("path", StringType()),
StructField("modificationTime", TimestampType()),
StructField("length", LongType()),
StructField("content", BinaryType()),
]
)
binary_df = spark.read.format("binaryFile").schema(BINARY_FILES_SCHEMA).load("/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag")
binary_df.printSchema()
display(binary_df)
Is there a way to read such large files in Databricks?