cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Very large binary files ingestion error when using binaryFile reader

eva_mcmf
New Contributor II

Hello, I am facing an error while trying to read a large binary file (rosbag format) using binaryFile reader. The file I am trying to read is approx 7GB large. Here's the error message I am getting:

FileReadException: Error while reading file dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag. Caused by: SparkException: The length of dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag is 7156086862, which exceeds the max length allowed: 2147483647.

Here's the code:

BINARY_FILES_SCHEMA = StructType(
    [
        StructField("path", StringType()),
        StructField("modificationTime", TimestampType()),
        StructField("length", LongType()),
        StructField("content", BinaryType()),
    ]
)
binary_df = spark.read.format("binaryFile").schema(BINARY_FILES_SCHEMA).load("/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag")
binary_df.printSchema()
display(binary_df)
 
Is there a way to read such large files in Databricks? 
1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @eva_mcmfThe error you're encountering is because the size of your binary file exceeds the maximum allowable length in Spark, which is 2147483647 bytes or approximately 2GB. The file you're trying to read is about 7GB, well beyond this limit. 

Unfortunately, there's no direct way to read such large files using the binaryFile format in Databricks.

However, you might consider the following alternatives:

1. Split the large file into smaller chunks within the size limit before reading them into Databricks. This can be done outside of Databricks, using various file-splitting tools depending on your operating system.

2. If the format supports it (like Parquet or Avro), you can read it as a distributed file. These formats allow Spark to read parts of the file across multiple nodes in the cluster, thus bypassing the size limit of a single node. However, this would not work with your rosbag file format.

3. If the large file is on a remote system like S3, you can increase the size of each part file as suggested in the third source.

This might help to avoid the error, but it's not guaranteed to work in all scenarios, especially if the file size is significantly larger than the limit.

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @eva_mcmfThe error you're encountering is because the size of your binary file exceeds the maximum allowable length in Spark, which is 2147483647 bytes or approximately 2GB. The file you're trying to read is about 7GB, well beyond this limit. 

Unfortunately, there's no direct way to read such large files using the binaryFile format in Databricks.

However, you might consider the following alternatives:

1. Split the large file into smaller chunks within the size limit before reading them into Databricks. This can be done outside of Databricks, using various file-splitting tools depending on your operating system.

2. If the format supports it (like Parquet or Avro), you can read it as a distributed file. These formats allow Spark to read parts of the file across multiple nodes in the cluster, thus bypassing the size limit of a single node. However, this would not work with your rosbag file format.

3. If the large file is on a remote system like S3, you can increase the size of each part file as suggested in the third source.

This might help to avoid the error, but it's not guaranteed to work in all scenarios, especially if the file size is significantly larger than the limit.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!