Hi @eva_mcmf, The error you're encountering is because the size of your binary file exceeds the maximum allowable length in Spark, which is 2147483647 bytes or approximately 2GB. The file you're trying to read is about 7GB, well beyond this limit.
Unfortunately, there's no direct way to read such large files using the binaryFile
format in Databricks.
However, you might consider the following alternatives:
1. Split the large file into smaller chunks within the size limit before reading them into Databricks. This can be done outside of Databricks, using various file-splitting tools depending on your operating system.
2. If the format supports it (like Parquet or Avro), you can read it as a distributed file. These formats allow Spark to read parts of the file across multiple nodes in the cluster, thus bypassing the size limit of a single node. However, this would not work with your rosbag file format.
3. If the large file is on a remote system like S3, you can increase the size of each part file as suggested in the third source.
This might help to avoid the error, but it's not guaranteed to work in all scenarios, especially if the file size is significantly larger than the limit.