Handling Binary Files Larger than 2GB in Apache Spark

pra18
New Contributor II

I'm trying to process large binary files (>2GB) in Apache Spark, but I'm running into the following error:

File format is : .mf4 (Measurement Data Format)

 

org.apache.spark.SparkException: The length of ... is 14749763360, which exceeds the max length allowed: 2147483647.

 

What are the best approaches to handle large binary files in Spark? Are there any workarounds, such as splitting the file before processing or using a different format?

Would appreciate any insights or best practices.

Thanks!