<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Getting FileNotFoundException while using cloudFiles in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/getting-filenotfoundexception-while-using-cloudfiles/m-p/43638#M932</link>
    <description>&lt;P&gt;Danny is another process mutating / deleting the incoming files?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 05 Sep 2023 13:05:02 GMT</pubDate>
    <dc:creator>BilalAslamDbrx</dc:creator>
    <dc:date>2023-09-05T13:05:02Z</dc:date>
    <item>
      <title>Getting FileNotFoundException while using cloudFiles</title>
      <link>https://community.databricks.com/t5/get-started-discussions/getting-filenotfoundexception-while-using-cloudfiles/m-p/43626#M931</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;Following is the code i am using the ingest the data incrementally (weekly).&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;val&lt;/SPAN&gt; &lt;SPAN&gt;ssdf&lt;/SPAN&gt;&lt;SPAN&gt; = spark.readStream&lt;/SPAN&gt;&lt;SPAN&gt;.schema(&lt;/SPAN&gt;&lt;SPAN&gt;schema&lt;/SPAN&gt;&lt;SPAN&gt;)&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.format(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"parquet"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.load(sourceUrl)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.filter(criteriaFilter&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;)&lt;BR /&gt;&lt;BR /&gt;val transformedDf = ssdf.transform(.....)&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;val&lt;/SPAN&gt; &lt;SPAN&gt;processData&lt;/SPAN&gt;&lt;SPAN&gt; = transformedDf&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.select(&lt;/SPAN&gt;&lt;SPAN&gt;recordFields&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;_*&lt;/SPAN&gt;&lt;SPAN&gt;)&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.writeStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option(&lt;/SPAN&gt;&lt;SPAN&gt;"checkpointLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, outputUrl + &lt;/SPAN&gt;&lt;SPAN&gt;"checkpoint/"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.format(&lt;/SPAN&gt;&lt;SPAN&gt;"parquet"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.outputMode(&lt;/SPAN&gt;&lt;SPAN&gt;"append"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option(&lt;/SPAN&gt;&lt;SPAN&gt;"path"&lt;/SPAN&gt;&lt;SPAN&gt;, outputUrl + run_id + &lt;/SPAN&gt;&lt;SPAN&gt;"/"&lt;/SPAN&gt;&lt;SPAN&gt;)&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.trigger(&lt;/SPAN&gt;&lt;SPAN&gt;Trigger&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;Once&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.start()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;processData.processAllAvailable()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;processData.stop()&lt;BR /&gt;&lt;BR /&gt;For each week, the data is written to a new folder and checkpoint to the same folder.&lt;BR /&gt;This worked fine for 3 to 5 incremental run.&lt;BR /&gt;But recently i got the following error :&amp;nbsp;&amp;nbsp;&lt;BR /&gt;ERROR: Query termination received for [id=2345245425], with exception: org.apache.spark.SparkException: Job aborted.&lt;BR /&gt;Caused by: java.io.FileNotFoundException: Unable to find batch s3://outputPath/20230810063959/_spark_metadata/0&lt;BR /&gt;What is the reason for this issue ? Any idea?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 05 Sep 2023 11:14:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/getting-filenotfoundexception-while-using-cloudfiles/m-p/43626#M931</guid>
      <dc:creator>dannythermadom</dc:creator>
      <dc:date>2023-09-05T11:14:42Z</dc:date>
    </item>
    <item>
      <title>Re: Getting FileNotFoundException while using cloudFiles</title>
      <link>https://community.databricks.com/t5/get-started-discussions/getting-filenotfoundexception-while-using-cloudfiles/m-p/43638#M932</link>
      <description>&lt;P&gt;Danny is another process mutating / deleting the incoming files?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 Sep 2023 13:05:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/getting-filenotfoundexception-while-using-cloudfiles/m-p/43638#M932</guid>
      <dc:creator>BilalAslamDbrx</dc:creator>
      <dc:date>2023-09-05T13:05:02Z</dc:date>
    </item>
    <item>
      <title>Re: Getting FileNotFoundException while using cloudFiles</title>
      <link>https://community.databricks.com/t5/get-started-discussions/getting-filenotfoundexception-while-using-cloudfiles/m-p/43640#M933</link>
      <description>&lt;P&gt;New files gets added to the input location. Input files are not deleted or updated ..&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 Sep 2023 13:08:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/getting-filenotfoundexception-while-using-cloudfiles/m-p/43640#M933</guid>
      <dc:creator>dannythermadom</dc:creator>
      <dc:date>2023-09-05T13:08:46Z</dc:date>
    </item>
  </channel>
</rss>

