cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Very large binary files ingestion error when using binaryFile reader

eva_mcmf
New Contributor II

Hello, I am facing an error while trying to read a large binary file (rosbag format) using binaryFile reader. The file I am trying to read is approx 7GB large. Here's the error message I am getting:

FileReadException: Error while reading file dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag. Caused by: SparkException: The length of dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag is 7156086862, which exceeds the max length allowed: 2147483647.

Here's the code:

BINARY_FILES_SCHEMA = StructType(
    [
        StructField("path", StringType()),
        StructField("modificationTime", TimestampType()),
        StructField("length", LongType()),
        StructField("content", BinaryType()),
    ]
)
binary_df = spark.read.format("binaryFile").schema(BINARY_FILES_SCHEMA).load("/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag")
binary_df.printSchema()
display(binary_df)
 
Is there a way to read such large files in Databricks? 
0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now