Re: Handling Binary Files Larger than 2GB in Apach...

pra18 · ‎02-17-2025

Thank you for the response. I didn't understand the command which you mentioned.
Here is the context where i'm facing this error:

I have folder on ADLS Gen2 with lot of sub folders on year/month/date/HH_MM_SS.mf4.
These file size range from 1GB to 14 GB.. so on.

Faced error when tried to convert the binaray content to dataframe.
Command:

mf4_df = spark.read.format("binaryFile") \
.option("pathGlobFilter", "*.mf4") \
.option("recursiveFileLookup", "true") \
.load("/mnt/adls_data/")

Result : mf4_df:pyspark.sql.connect.dataframe.DataFrame
path:string
modificationTime:timestamp
length:long
content:binary

Then used customer library "from asammdf import MDF" for converting binary content to Dataframe.

Thanks !