<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: spark.read.parquet() - how to check for file lock before reading? (azure) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-read-parquet-how-to-check-for-file-lock-before-reading/m-p/32330#M23555</link>
    <description>&lt;P&gt;That's the problem - it's not being locked (or fs.mv() isn't checking/honoring the lock). The upload process/tool is a 3rd-prty external tool&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can see via the upload tool that the file upload is 'in progress'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can also see the 0 byte destination file in the adlsv2 container (while its being uploaded) &lt;/P&gt;</description>
    <pubDate>Fri, 09 Sep 2022 02:33:57 GMT</pubDate>
    <dc:creator>jakubk</dc:creator>
    <dc:date>2022-09-09T02:33:57Z</dc:date>
    <item>
      <title>spark.read.parquet() - how to check for file lock before reading? (azure)</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-read-parquet-how-to-check-for-file-lock-before-reading/m-p/32328#M23553</link>
      <description>&lt;P&gt;I have some python code which takes parquet files from an adlsv2 location and merges it into delta tables (run as a workflow job on a schedule)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a try catch wrapper around this so that any files that fail get moved into a failed folder using dbutils.fs.mv while the files that get processed are archived off to a different location&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;One scenario i've encountered is this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;external upload process is uploading somefile.parquet to adlsv2&lt;/P&gt;&lt;P&gt;- the workflow job starts&lt;/P&gt;&lt;P&gt;- spark.read.parquet() fails with - Caused by: java.io.IOException: Could not read footer for file:&lt;/P&gt;&lt;P&gt;- dbutils.fs.mv moves the file (boo)&lt;/P&gt;&lt;P&gt;- the external process fails because mv has deleted the target while the upload is in progress&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'd assumed that mv would fail because there would be a exclusive lock on the file while its being uploaded but that's not the case (??)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any suggestions on how to handle this?&lt;/P&gt;&lt;P&gt;Is there a way for me to check if a file is locked/being written to?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What's the error/exception to catch for this error? i've spent an hour(s) trying to figure it out but the generic python ones dont cover it and I get a nameerror for the specific spark ones I try&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Sep 2022 04:52:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-read-parquet-how-to-check-for-file-lock-before-reading/m-p/32328#M23553</guid>
      <dc:creator>jakubk</dc:creator>
      <dc:date>2022-09-08T04:52:25Z</dc:date>
    </item>
    <item>
      <title>Re: spark.read.parquet() - how to check for file lock before reading? (azure)</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-read-parquet-how-to-check-for-file-lock-before-reading/m-p/32329#M23554</link>
      <description>&lt;P&gt;do you have any idea on how the file would be locked?  Because that should not be the case (unless the file is actually being written, so not finished yet).&lt;/P&gt;</description>
      <pubDate>Thu, 08 Sep 2022 09:55:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-read-parquet-how-to-check-for-file-lock-before-reading/m-p/32329#M23554</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-09-08T09:55:49Z</dc:date>
    </item>
    <item>
      <title>Re: spark.read.parquet() - how to check for file lock before reading? (azure)</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-read-parquet-how-to-check-for-file-lock-before-reading/m-p/32330#M23555</link>
      <description>&lt;P&gt;That's the problem - it's not being locked (or fs.mv() isn't checking/honoring the lock). The upload process/tool is a 3rd-prty external tool&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can see via the upload tool that the file upload is 'in progress'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can also see the 0 byte destination file in the adlsv2 container (while its being uploaded) &lt;/P&gt;</description>
      <pubDate>Fri, 09 Sep 2022 02:33:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-read-parquet-how-to-check-for-file-lock-before-reading/m-p/32330#M23555</guid>
      <dc:creator>jakubk</dc:creator>
      <dc:date>2022-09-09T02:33:57Z</dc:date>
    </item>
  </channel>
</rss>

