<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Autoloader failed in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12942#M7690</link>
    <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Capture"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2380i9AB0A4786E06888F/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture" alt="Capture" /&gt;&lt;/span&gt;Thank you, Deepak, I see the folder and many folders in it. How can I identify the changes if any in Azure Gen2?&lt;/P&gt;</description>
    <pubDate>Wed, 20 Oct 2021 19:43:15 GMT</pubDate>
    <dc:creator>dimoobraznii</dc:creator>
    <dc:date>2021-10-20T19:43:15Z</dc:date>
    <item>
      <title>Autoloader failed</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12937#M7685</link>
      <description>&lt;P&gt;I used autoloader with TriggerOnce = true and ran it for weeks with schedule. Today it broke:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The metadata file in the streaming source checkpoint directory is missing. This metadata&lt;/P&gt;&lt;P&gt;file contains important default options for the stream, so the stream cannot be restarted&lt;/P&gt;&lt;P&gt;right now. Please contact Databricks support for assistance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;StreamingQueryException: &lt;/P&gt;&lt;P&gt;---------------------------------------------------------------------------&lt;/P&gt;&lt;P&gt;StreamingQueryException                   Traceback (most recent call last)&lt;/P&gt;&lt;P&gt;&amp;lt;command-1866658421247823&amp;gt; in &amp;lt;module&amp;gt;&lt;/P&gt;&lt;P&gt;      1 #Waiting end of autoloader&lt;/P&gt;&lt;P&gt;----&amp;gt; 2 autoloader_query.awaitTermination()&lt;/P&gt;&lt;P&gt;      3 &lt;/P&gt;&lt;P&gt;      4 #Show the output from the autoloader job&lt;/P&gt;&lt;P&gt;      5 autoloader_query.recentProgress&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/spark/python/pyspark/sql/streaming.py in awaitTermination(self, timeout)&lt;/P&gt;&lt;P&gt;     99             return self._jsq.awaitTermination(int(timeout * 1000))&lt;/P&gt;&lt;P&gt;    100         else:&lt;/P&gt;&lt;P&gt;--&amp;gt; 101             return self._jsq.awaitTermination()&lt;/P&gt;&lt;P&gt;    102 &lt;/P&gt;&lt;P&gt;    103     @property&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)&lt;/P&gt;&lt;P&gt;   1302 &lt;/P&gt;&lt;P&gt;   1303         answer = self.gateway_client.send_command(command)&lt;/P&gt;&lt;P&gt;-&amp;gt; 1304         return_value = get_return_value(&lt;/P&gt;&lt;P&gt;   1305             answer, self.gateway_client, self.target_id, self.name)&lt;/P&gt;&lt;P&gt;   1306 &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)&lt;/P&gt;&lt;P&gt;    121                 # Hide where the exception came from that shows a non-Pythonic&lt;/P&gt;&lt;P&gt;    122                 # JVM exception message.&lt;/P&gt;&lt;P&gt;--&amp;gt; 123                 raise converted from None&lt;/P&gt;&lt;P&gt;    124             else:&lt;/P&gt;&lt;P&gt;    125                 raise&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;StreamingQueryException: &lt;/P&gt;&lt;P&gt;The metadata file in the streaming source checkpoint directory is missing. This metadata&lt;/P&gt;&lt;P&gt;file contains important default options for the stream, so the stream cannot be restarted&lt;/P&gt;&lt;P&gt;right now. Please contact Databricks support for assistance.&lt;/P&gt;&lt;P&gt;       &lt;/P&gt;&lt;P&gt;=== Streaming Query ===&lt;/P&gt;&lt;P&gt;Identifier: [id = 0416c163-a2de-4f6d-82f7-189a0e7bb39e, runId = 5b7a00bb-3c27-4f04-bfd7-bce8d36bf225]&lt;/P&gt;&lt;P&gt;Current Committed Offsets: {CloudFilesSource[wasbs://data@&amp;lt;MY STORAGE&amp;gt;.blob.core.windows.net/*/*/*/*/]: {"seqNum":18550,"sourceVersion":1,"lastBackfillStartTimeMs":1632167318876,"lastBackfillFinishTimeMs":1632167323294}}&lt;/P&gt;&lt;P&gt;Current Available Offsets: {CloudFilesSource[wasbs://bidata@colllectorprotostorage.blob.core.windows.net/*/*/*/*/]: {"seqNum":18560,"sourceVersion":1,"lastBackfillStartTimeMs":1632167318876,"lastBackfillFinishTimeMs":1632167323294}}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Current State: ACTIVE&lt;/P&gt;&lt;P&gt;Thread State: RUNNABLE&lt;/P&gt;</description>
      <pubDate>Wed, 20 Oct 2021 06:00:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12937#M7685</guid>
      <dc:creator>dimoobraznii</dc:creator>
      <dc:date>2021-10-20T06:00:08Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader failed</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12939#M7687</link>
      <description>&lt;P&gt;.option("checkpointLocation", "dbfs://checkpointPath")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In future you can specify location by yourself (using above option) to have better control on it. I bet that checkpoint was on source directory and somehow it was corrupted or deleted.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Oct 2021 09:24:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12939#M7687</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-10-20T09:24:55Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader failed</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12940#M7688</link>
      <description>&lt;P&gt;I have this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;checkpoint_path = target_path + "checkpoints/"
&amp;nbsp;
autoloader_query = (raw_df.writeStream
                 .format("delta")
                 .trigger(once=True)
                 .option("checkpointLocation",checkpoint_path)
                 .partitionBy("p_ingest_date_utc","p_ingest_hour_utc")
                 .table("raw_cooked")
                )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;So I have it. And it was working long time.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Oct 2021 15:56:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12940#M7688</guid>
      <dc:creator>dimoobraznii</dc:creator>
      <dc:date>2021-10-20T15:56:08Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader failed</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12941#M7689</link>
      <description>&lt;P&gt;Hi &lt;A href="https://community.databricks.com/s/profile/0053f000000WXBvAAO" alt="https://community.databricks.com/s/profile/0053f000000WXBvAAO" target="_blank"&gt;dimoobraznii&lt;/A&gt;&amp;nbsp;(Customer)&lt;/P&gt;&lt;P&gt;This error comes in streaming when someone makes changes to the streaming checkpoint directory manually or points some streaming type to the checkpoint of some other streaming type. Please check if any changes were made to the checkpoint just before the error run.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Oct 2021 19:41:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12941#M7689</guid>
      <dc:creator>Deepak_Bhutada</dc:creator>
      <dc:date>2021-10-20T19:41:32Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader failed</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12942#M7690</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Capture"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2380i9AB0A4786E06888F/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture" alt="Capture" /&gt;&lt;/span&gt;Thank you, Deepak, I see the folder and many folders in it. How can I identify the changes if any in Azure Gen2?&lt;/P&gt;</description>
      <pubDate>Wed, 20 Oct 2021 19:43:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12942#M7690</guid>
      <dc:creator>dimoobraznii</dc:creator>
      <dc:date>2021-10-20T19:43:15Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader failed</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12943#M7691</link>
      <description>&lt;P&gt;&lt;A href="https://community.databricks.com/s/profile/0053f000000WXBvAAO" alt="https://community.databricks.com/s/profile/0053f000000WXBvAAO" target="_blank"&gt;@dimoobraznii&lt;/A&gt;&amp;nbsp;(Customer)​&amp;nbsp;I think the Azure storage team could help to identify the changes made on the metadata file. You could check this with them&lt;/P&gt;</description>
      <pubDate>Wed, 10 Nov 2021 16:14:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-failed/m-p/12943#M7691</guid>
      <dc:creator>Sandeep</dc:creator>
      <dc:date>2021-11-10T16:14:01Z</dc:date>
    </item>
  </channel>
</rss>

