<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Error while reading file from Cloud Storage in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-while-reading-file-from-cloud-storage/m-p/113170#M44449</link>
    <description>&lt;P&gt;The code we are executing:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;df = spark.read.format("parquet").load("/mnt/g/drb/HN/")&amp;nbsp;&lt;BR /&gt;df.write.mode('overwrite').saveAsTable("bronze.HN")&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;the error it throws:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 642.0 failed 4 times, most recent failure: Lost task 44.3 in stage 642.0 (TID 8175) (10.1.162.134 executor 1): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/g/drb/HN/HN_1.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;In the&amp;nbsp;&lt;SPAN&gt;/mnt/g/drb/HN/ there are multiple parquet files, when loading and displaying all of these files in a single Spark Dataframe it displays it correctly. However, when we try to save it as a table the same error is thrown.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;How we tried to save the table from the Spark Dataframe: Created a temp view -&amp;gt; save as table.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;We tried increasing the compute size, currently it is at 24 DBU, which did not resolve the issue.&lt;/P&gt;&lt;P&gt;For other parquet files in a different cloud storage container we are able to correctly create tables (in the hive_metastore)&lt;/P&gt;&lt;P&gt;So how are we able to store these parquet files in a table?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 20 Mar 2025 15:31:40 GMT</pubDate>
    <dc:creator>DylanStout</dc:creator>
    <dc:date>2025-03-20T15:31:40Z</dc:date>
    <item>
      <title>Error while reading file from Cloud Storage</title>
      <link>https://community.databricks.com/t5/data-engineering/error-while-reading-file-from-cloud-storage/m-p/113170#M44449</link>
      <description>&lt;P&gt;The code we are executing:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;df = spark.read.format("parquet").load("/mnt/g/drb/HN/")&amp;nbsp;&lt;BR /&gt;df.write.mode('overwrite').saveAsTable("bronze.HN")&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;the error it throws:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 642.0 failed 4 times, most recent failure: Lost task 44.3 in stage 642.0 (TID 8175) (10.1.162.134 executor 1): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/g/drb/HN/HN_1.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;In the&amp;nbsp;&lt;SPAN&gt;/mnt/g/drb/HN/ there are multiple parquet files, when loading and displaying all of these files in a single Spark Dataframe it displays it correctly. However, when we try to save it as a table the same error is thrown.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;How we tried to save the table from the Spark Dataframe: Created a temp view -&amp;gt; save as table.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;We tried increasing the compute size, currently it is at 24 DBU, which did not resolve the issue.&lt;/P&gt;&lt;P&gt;For other parquet files in a different cloud storage container we are able to correctly create tables (in the hive_metastore)&lt;/P&gt;&lt;P&gt;So how are we able to store these parquet files in a table?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Mar 2025 15:31:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-while-reading-file-from-cloud-storage/m-p/113170#M44449</guid>
      <dc:creator>DylanStout</dc:creator>
      <dc:date>2025-03-20T15:31:40Z</dc:date>
    </item>
    <item>
      <title>Re: Error while reading file from Cloud Storage</title>
      <link>https://community.databricks.com/t5/data-engineering/error-while-reading-file-from-cloud-storage/m-p/113171#M44450</link>
      <description>&lt;P&gt;Try these solutions&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/how-can-i-convert-a-parquet-into-delta-table/td-p/14348" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/how-can-i-convert-a-parquet-into-delta-table/td-p/14348&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Mar 2025 15:46:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-while-reading-file-from-cloud-storage/m-p/113171#M44450</guid>
      <dc:creator>ashraf1395</dc:creator>
      <dc:date>2025-03-20T15:46:52Z</dc:date>
    </item>
    <item>
      <title>Re: Error while reading file from Cloud Storage</title>
      <link>https://community.databricks.com/t5/data-engineering/error-while-reading-file-from-cloud-storage/m-p/113810#M44645</link>
      <description>&lt;P&gt;&lt;SPAN&gt;spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Mar 2025 13:51:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-while-reading-file-from-cloud-storage/m-p/113810#M44645</guid>
      <dc:creator>DylanStout</dc:creator>
      <dc:date>2025-03-27T13:51:40Z</dc:date>
    </item>
  </channel>
</rss>

