<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Very large binary files ingestion error when using binaryFile reader in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/very-large-binary-files-ingestion-error-when-using-binaryfile/m-p/47440#M5929</link>
    <description>&lt;P&gt;Hello, I am facing an error while trying to read a large binary file (rosbag format) using binaryFile reader. The file I am trying to read is approx 7GB large. Here's the error message I am getting:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;FileReadException: Error while reading file dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag. Caused by: SparkException: The length of dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag is 7156086862, which exceeds the max length allowed: 2147483647.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Here's the code:&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;BINARY_FILES_SCHEMA = StructType(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; [&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; StructField(&lt;/SPAN&gt;&lt;SPAN&gt;"path"&lt;/SPAN&gt;&lt;SPAN&gt;, StringType()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; StructField(&lt;/SPAN&gt;&lt;SPAN&gt;"modificationTime"&lt;/SPAN&gt;&lt;SPAN&gt;, TimestampType()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; StructField(&lt;/SPAN&gt;&lt;SPAN&gt;"length"&lt;/SPAN&gt;&lt;SPAN&gt;, LongType()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; StructField(&lt;/SPAN&gt;&lt;SPAN&gt;"content"&lt;/SPAN&gt;&lt;SPAN&gt;, BinaryType()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; ]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;binary_df = spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"binaryFile"&lt;/SPAN&gt;&lt;SPAN&gt;).schema(BINARY_FILES_SCHEMA).load(&lt;/SPAN&gt;&lt;SPAN&gt;"/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;binary_df.printSchema()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;display(binary_df)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Is there a way to read such large files in Databricks?&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Mon, 02 Oct 2023 08:21:16 GMT</pubDate>
    <dc:creator>eva_mcmf</dc:creator>
    <dc:date>2023-10-02T08:21:16Z</dc:date>
    <item>
      <title>Very large binary files ingestion error when using binaryFile reader</title>
      <link>https://community.databricks.com/t5/get-started-discussions/very-large-binary-files-ingestion-error-when-using-binaryfile/m-p/47440#M5929</link>
      <description>&lt;P&gt;Hello, I am facing an error while trying to read a large binary file (rosbag format) using binaryFile reader. The file I am trying to read is approx 7GB large. Here's the error message I am getting:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;FileReadException: Error while reading file dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag. Caused by: SparkException: The length of dbfs:/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag is 7156086862, which exceeds the max length allowed: 2147483647.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Here's the code:&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;BINARY_FILES_SCHEMA = StructType(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; [&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; StructField(&lt;/SPAN&gt;&lt;SPAN&gt;"path"&lt;/SPAN&gt;&lt;SPAN&gt;, StringType()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; StructField(&lt;/SPAN&gt;&lt;SPAN&gt;"modificationTime"&lt;/SPAN&gt;&lt;SPAN&gt;, TimestampType()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; StructField(&lt;/SPAN&gt;&lt;SPAN&gt;"length"&lt;/SPAN&gt;&lt;SPAN&gt;, LongType()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; StructField(&lt;/SPAN&gt;&lt;SPAN&gt;"content"&lt;/SPAN&gt;&lt;SPAN&gt;, BinaryType()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; ]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;binary_df = spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"binaryFile"&lt;/SPAN&gt;&lt;SPAN&gt;).schema(BINARY_FILES_SCHEMA).load(&lt;/SPAN&gt;&lt;SPAN&gt;"/mnt/0-landingzone/tla/7a0cb35d-b606-4a9e-890b-83fc385f78ca.bag"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;binary_df.printSchema()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;display(binary_df)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Is there a way to read such large files in Databricks?&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 02 Oct 2023 08:21:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/very-large-binary-files-ingestion-error-when-using-binaryfile/m-p/47440#M5929</guid>
      <dc:creator>eva_mcmf</dc:creator>
      <dc:date>2023-10-02T08:21:16Z</dc:date>
    </item>
  </channel>
</rss>

