<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/com-databricks-sql-io-filereadexception-caused-by-com-microsoft/m-p/30977#M22517</link>
    <description>&lt;P&gt;Hi @mayuri18kadam@gmail.com​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This could be a limitation from Spark submit job. Please check the docs from here &lt;A href="https://docs.databricks.com/jobs.html#create-a-job" target="test_blank"&gt;https://docs.databricks.com/jobs.html#create-a-job&lt;/A&gt; please look for the following information:&lt;/P&gt;&lt;P&gt;Important&lt;/P&gt;&lt;P&gt;There are several limitations for&amp;nbsp;spark-submit&amp;nbsp;tasks....&lt;/P&gt;</description>
    <pubDate>Wed, 09 Feb 2022 01:04:57 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2022-02-09T01:04:57Z</dc:date>
    <item>
      <title>com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch</title>
      <link>https://community.databricks.com/t5/data-engineering/com-databricks-sql-io-filereadexception-caused-by-com-microsoft/m-p/30974#M22514</link>
      <description>&lt;P&gt;Hi, I am getting the following error:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18/part-00003-tid-4178615623264760328.c000.avro.
Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch (integrity check failed), Expected value is 8P7bo1mnLPoLxVw==, retrieved bu+CiCkLm/kc6QA==.&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;where processYear, processMonth, processDay and processHour are partition columns.&lt;/P&gt;&lt;P&gt;however, this is actually just a WARN, and the code still proceeds to execute(also I am able to read this file separately in notebook)... but eventually the job dies due to:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;WARN Lost task 9026.0 in stage 324.0 (TID 1525596, 10.139.64.16, executor 83): TaskKilled (Stage cancelled)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am using the following databricks and spark configs:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;RuntimeVersion: 5.5.x-scala2.11
&amp;nbsp;
MasterConfiguration:
&amp;nbsp;
    NodeType: Standard_D32s_v3
&amp;nbsp;
    NumberOfNodes: 1
&amp;nbsp;
WorkerConfiguration:
&amp;nbsp;
    NodeType: Standard_D32s_v3
&amp;nbsp;
    NumberOfNodes: 2&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This same job is deployed in several other environments too with much more data volume, and it does not fail there. Any idea why it may fail here?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 25 Jan 2022 04:45:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/com-databricks-sql-io-filereadexception-caused-by-com-microsoft/m-p/30974#M22514</guid>
      <dc:creator>mayuri18kadam</dc:creator>
      <dc:date>2022-01-25T04:45:25Z</dc:date>
    </item>
    <item>
      <title>Re: com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch</title>
      <link>https://community.databricks.com/t5/data-engineering/com-databricks-sql-io-filereadexception-caused-by-com-microsoft/m-p/30975#M22515</link>
      <description>&lt;P&gt;can you try to run your code but without the file you get an exception on?&lt;/P&gt;</description>
      <pubDate>Tue, 25 Jan 2022 12:25:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/com-databricks-sql-io-filereadexception-caused-by-com-microsoft/m-p/30975#M22515</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-01-25T12:25:34Z</dc:date>
    </item>
    <item>
      <title>Re: com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch</title>
      <link>https://community.databricks.com/t5/data-engineering/com-databricks-sql-io-filereadexception-caused-by-com-microsoft/m-p/30976#M22516</link>
      <description>&lt;P&gt;yes, I can read from notebook with DBR 6.4, when I specify this path: &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;but the same using DBR 6.4 from spark-submit, it fails again.. each time complaining of different part files under different partitions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also we have the exact same code, with exact same spark configs deployed in several different regions but this is only where we have an issue. Could this be data related, like some part file size limitations for the given spark version?&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jan 2022 18:05:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/com-databricks-sql-io-filereadexception-caused-by-com-microsoft/m-p/30976#M22516</guid>
      <dc:creator>mayuri18kadam</dc:creator>
      <dc:date>2022-01-26T18:05:45Z</dc:date>
    </item>
    <item>
      <title>Re: com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch</title>
      <link>https://community.databricks.com/t5/data-engineering/com-databricks-sql-io-filereadexception-caused-by-com-microsoft/m-p/30977#M22517</link>
      <description>&lt;P&gt;Hi @mayuri18kadam@gmail.com​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This could be a limitation from Spark submit job. Please check the docs from here &lt;A href="https://docs.databricks.com/jobs.html#create-a-job" target="test_blank"&gt;https://docs.databricks.com/jobs.html#create-a-job&lt;/A&gt; please look for the following information:&lt;/P&gt;&lt;P&gt;Important&lt;/P&gt;&lt;P&gt;There are several limitations for&amp;nbsp;spark-submit&amp;nbsp;tasks....&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 01:04:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/com-databricks-sql-io-filereadexception-caused-by-com-microsoft/m-p/30977#M22517</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-02-09T01:04:57Z</dc:date>
    </item>
  </channel>
</rss>

