<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is the Pyspark equivalent of FSCK REPAIR TABLE? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-is-the-pyspark-equivalent-of-fsck-repair-table/m-p/5707#M2048</link>
    <description>&lt;PRE&gt;&lt;CODE&gt;## Delta check when a file was added
%scala
(oldest-version-available to newest-version-available).map { version =&amp;gt;
  var df = spark.read.json(f"&amp;lt;delta-table-location&amp;gt;/_delta_log/$version%020d.json").where("add is not null").select("add.path")
  var df2 = df.filter('path.contains("name-of-the-parquet-file"))
  if (df2.count &amp;gt; 0) {
    print("********* " + version)
  }
}&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;@Dean Lovelace​&amp;nbsp;- Please use the above code snippet to identify in which available version the file is present. &lt;/P&gt;&lt;P&gt;change oldest-version-available to newest-version-available for the numbers of the delta History that you would like to check change the delta path to your delta table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you are facing this issue while reading (after doing FSCK REPAIR), could you please try using the below config. &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql.files.ignoreCorruptFiles true
spark.sql.files.ignoreMissingFiles true&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 19 Apr 2023 14:40:38 GMT</pubDate>
    <dc:creator>shan_chandra</dc:creator>
    <dc:date>2023-04-19T14:40:38Z</dc:date>
    <item>
      <title>What is the Pyspark equivalent of FSCK REPAIR TABLE?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-pyspark-equivalent-of-fsck-repair-table/m-p/5704#M2045</link>
      <description>&lt;P&gt;I am using the delta format and occasionaly get the following error:-&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;"xx.parquet referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement"&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;FSCK REPAIR TABLE works for Hive based tables, but I am using the file system only.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How can I re-create the delta transaction log (rather than rebuilding the whole dataset)?&lt;/P&gt;</description>
      <pubDate>Mon, 17 Apr 2023 14:14:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-pyspark-equivalent-of-fsck-repair-table/m-p/5704#M2045</guid>
      <dc:creator>Dean_Lovelace</dc:creator>
      <dc:date>2023-04-17T14:14:04Z</dc:date>
    </item>
    <item>
      <title>Re: What is the Pyspark equivalent of FSCK REPAIR TABLE?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-pyspark-equivalent-of-fsck-repair-table/m-p/5706#M2047</link>
      <description>&lt;P&gt;Hi @Dean Lovelace​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Apr 2023 07:39:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-pyspark-equivalent-of-fsck-repair-table/m-p/5706#M2047</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-18T07:39:49Z</dc:date>
    </item>
    <item>
      <title>Re: What is the Pyspark equivalent of FSCK REPAIR TABLE?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-pyspark-equivalent-of-fsck-repair-table/m-p/5707#M2048</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;## Delta check when a file was added
%scala
(oldest-version-available to newest-version-available).map { version =&amp;gt;
  var df = spark.read.json(f"&amp;lt;delta-table-location&amp;gt;/_delta_log/$version%020d.json").where("add is not null").select("add.path")
  var df2 = df.filter('path.contains("name-of-the-parquet-file"))
  if (df2.count &amp;gt; 0) {
    print("********* " + version)
  }
}&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;@Dean Lovelace​&amp;nbsp;- Please use the above code snippet to identify in which available version the file is present. &lt;/P&gt;&lt;P&gt;change oldest-version-available to newest-version-available for the numbers of the delta History that you would like to check change the delta path to your delta table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you are facing this issue while reading (after doing FSCK REPAIR), could you please try using the below config. &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql.files.ignoreCorruptFiles true
spark.sql.files.ignoreMissingFiles true&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Apr 2023 14:40:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-pyspark-equivalent-of-fsck-repair-table/m-p/5707#M2048</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2023-04-19T14:40:38Z</dc:date>
    </item>
    <item>
      <title>Re: What is the Pyspark equivalent of FSCK REPAIR TABLE?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-pyspark-equivalent-of-fsck-repair-table/m-p/5705#M2046</link>
      <description>&lt;P&gt;Hi, Please refer to &lt;A href="https://kb.databricks.com/delta/filereadexception-when-reading-delta-table" alt="https://kb.databricks.com/delta/filereadexception-when-reading-delta-table" target="_blank"&gt;https://kb.databricks.com/delta/filereadexception-when-reading-delta-table&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;Please let us know if this helps. &lt;/P&gt;&lt;P&gt;Also, please tag @Debayan Mukherjee​&amp;nbsp;with your next response which will notify me. Thank you!&lt;/P&gt;</description>
      <pubDate>Tue, 18 Apr 2023 06:35:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-pyspark-equivalent-of-fsck-repair-table/m-p/5705#M2046</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-04-18T06:35:42Z</dc:date>
    </item>
  </channel>
</rss>

