Very large binary files ingestion error when using binaryFile reader

eva_mcmf
New Contributor II

Hello, I am facing an error while trying to read a large binary file (rosbag format) using binaryFile reader. The file I am trying to read is approx 7GB large. Here's the error message I am getting:

FileReadException: Error while reading file dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag. Caused by: SparkException: The length of dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag is 7156086862, which exceeds the max length allowed: 2147483647.

Here's the code:

BINARY_FILES_SCHEMA = StructType(
    [
        StructField("path", StringType()),
        StructField("modificationTime", TimestampType()),
        StructField("length", LongType()),
        StructField("content", BinaryType()),
    ]
)
binary_df = spark.read.format("binaryFile").schema(BINARY_FILES_SCHEMA).load("/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag")
binary_df.printSchema()
display(binary_df)
 
Is there a way to read such large files in Databricks?