<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Manual overwrite in s3 console of a collection of parquet files and now we can't read them. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/manual-overwrite-in-s3-console-of-a-collection-of-parquet-files/m-p/27983#M19821</link>
    <description>&lt;P&gt;Hello, @Lili Ehrlich​. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance for your patience. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 16 Feb 2022 16:24:05 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2022-02-16T16:24:05Z</dc:date>
    <item>
      <title>Manual overwrite in s3 console of a collection of parquet files and now we can't read them.</title>
      <link>https://community.databricks.com/t5/data-engineering/manual-overwrite-in-s3-console-of-a-collection-of-parquet-files/m-p/27982#M19820</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 26.0 failed 4 times, most recent failure: Lost task 19.3 in stage 26.0 (TID 4205, 10.66.225.154, executor 0): com.databricks.sql.io.FileReadException: Error while reading file s3://s3-datascience-prod/redshift/daily/raw/ds/product_information/date=2022-02-05/part-00002-d81f7a47-0421-42a7-9187-f421b0c734b9.c000.snappy.parquet. A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement. For more information, see &lt;A href="https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions" target="test_blank"&gt;https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions&lt;/A&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;We are trying to copy a file (delta format) with known good properties over several days of known bad properties. We initially did this via S3 CLI but encountered issues with the above error. We tried then copying via dbutils and a databricks notebook over the same file paths and got the same error again. How can we reset this to a state where we don’t encounter the above error &amp;amp; our known good delta files are copied for every date in the date range?&lt;/P&gt;</description>
      <pubDate>Tue, 15 Feb 2022 21:05:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/manual-overwrite-in-s3-console-of-a-collection-of-parquet-files/m-p/27982#M19820</guid>
      <dc:creator>fff_ds</dc:creator>
      <dc:date>2022-02-15T21:05:44Z</dc:date>
    </item>
    <item>
      <title>Re: Manual overwrite in s3 console of a collection of parquet files and now we can't read them.</title>
      <link>https://community.databricks.com/t5/data-engineering/manual-overwrite-in-s3-console-of-a-collection-of-parquet-files/m-p/27983#M19821</link>
      <description>&lt;P&gt;Hello, @Lili Ehrlich​. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance for your patience. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Feb 2022 16:24:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/manual-overwrite-in-s3-console-of-a-collection-of-parquet-files/m-p/27983#M19821</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-02-16T16:24:05Z</dc:date>
    </item>
  </channel>
</rss>

