<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Autoloader - understanding missing file after schema update. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8178#M3886</link>
    <description>&lt;P&gt;Hi @Debayan Mukherjee​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't have a custom spark conf (except the following line in order to make it ignore the missing file)&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql.files.ignoreMissingFiles true&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The cluster conf&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Policy: Unrestricted
Multi node
Access mode: Single user
Databricks runtime version: 11.3 LTS (Scala 2.12, Spark 3.3.0)
Worker type: r5d.xlarge
Workers: 2 (64 GB Memory 8 cores)
Driver type: Same as worker (32 GB Memory, 4 Cores)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I'm using Unity Catalog also if that helps.&lt;/P&gt;</description>
    <pubDate>Fri, 17 Mar 2023 15:54:59 GMT</pubDate>
    <dc:creator>Larrio</dc:creator>
    <dc:date>2023-03-17T15:54:59Z</dc:date>
    <item>
      <title>Autoloader - understanding missing file after schema update.</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8174#M3882</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Concerning Autoloader (based on &lt;A href="https://docs.databricks.com/ingestion/auto-loader/schema.html" alt="https://docs.databricks.com/ingestion/auto-loader/schema.html" target="_blank"&gt;https://docs.databricks.com/ingestion/auto-loader/schema.html&lt;/A&gt;), so far what I understand is when it detects a schema update, the stream fails and I have to rerun it to make it works, it's ok.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But once I rerun it, it look for missing files, hence the following exception &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Caused by: com.databricks.sql.io.FileReadException: Error while reading file s3://some-bucket/path/to/data/1999/10/20/***.parquet. [CLOUD_FILE_SOURCE_FILE_NOT_FOUND] A file notification was received for file: s3://some-bucket/path/to/data/1999/10/20/***.parquet but it does not exist anymore. Please ensure that files are not deleted before they are processed. To continue your stream, you can set the Spark SQL configuration spark.sql.files.ignoreMissingFiles to true.&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;It works well once I set ignoreMissingFiles to True.&lt;/P&gt;&lt;P&gt;I understand it fails the first time it detects a change, but why does it looks for deleted files the second time autoloader runs ?&lt;/P&gt;&lt;P&gt;What are the impact ? Do I lose data ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks !&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2023 10:06:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8174#M3882</guid>
      <dc:creator>Larrio</dc:creator>
      <dc:date>2023-03-07T10:06:48Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader - understanding missing file after schema update.</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8176#M3884</link>
      <description>&lt;P&gt;Hello @Debayan Mukherjee​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your answer, I've already seen this read and it's good to know how a missing file is handle.&lt;/P&gt;&lt;P&gt;But my question here is more about the Autoloader, why do we have missing files in the first place ?&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 09:02:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8176#M3884</guid>
      <dc:creator>Larrio</dc:creator>
      <dc:date>2023-03-09T09:02:44Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader - understanding missing file after schema update.</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8178#M3886</link>
      <description>&lt;P&gt;Hi @Debayan Mukherjee​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't have a custom spark conf (except the following line in order to make it ignore the missing file)&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql.files.ignoreMissingFiles true&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The cluster conf&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Policy: Unrestricted
Multi node
Access mode: Single user
Databricks runtime version: 11.3 LTS (Scala 2.12, Spark 3.3.0)
Worker type: r5d.xlarge
Workers: 2 (64 GB Memory 8 cores)
Driver type: Same as worker (32 GB Memory, 4 Cores)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I'm using Unity Catalog also if that helps.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2023 15:54:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8178#M3886</guid>
      <dc:creator>Larrio</dc:creator>
      <dc:date>2023-03-17T15:54:59Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader - understanding missing file after schema update.</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8179#M3887</link>
      <description>&lt;P&gt;Hi @Lucien Arrio​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Apr 2023 00:47:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8179#M3887</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-01T00:47:18Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader - understanding missing file after schema update.</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8180#M3888</link>
      <description>&lt;P&gt;Hello, I still don't have an answer on why do we have missing files, I understood how Spark handle it but I don't know why do we have missing files in the first place.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2023 08:42:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8180#M3888</guid>
      <dc:creator>Larrio</dc:creator>
      <dc:date>2023-04-06T08:42:32Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader - understanding missing file after schema update.</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8175#M3883</link>
      <description>&lt;P&gt;Hi, I found an interesting read on the same error received: &lt;A href="https://www.waitingforcode.com/apache-spark-sql/ignoring-files-issues-apache-spark-sql/read" alt="https://www.waitingforcode.com/apache-spark-sql/ignoring-files-issues-apache-spark-sql/read" target="_blank"&gt;https://www.waitingforcode.com/apache-spark-sql/ignoring-files-issues-apache-spark-sql/read&lt;/A&gt; , let us know if this helps. &lt;/P&gt;&lt;P&gt;Also please tag&amp;nbsp;&lt;A href="https://community.databricks.com/s/profile/0053f000000WWwvAAG" alt="https://community.databricks.com/s/profile/0053f000000WWwvAAG" target="_blank"&gt;@Debayan&lt;/A&gt;​&amp;nbsp;with your next response which will notify me, Thank you!&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 06:28:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8175#M3883</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-03-09T06:28:45Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader - understanding missing file after schema update.</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8177#M3885</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could you please confirm your cluster configurations? Also, the spark conf? &lt;/P&gt;</description>
      <pubDate>Mon, 13 Mar 2023 05:43:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-understanding-missing-file-after-schema-update/m-p/8177#M3885</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-03-13T05:43:45Z</dc:date>
    </item>
  </channel>
</rss>

