<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Auto Loader cloudFiles.backfillInterval in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/49596#M28592</link>
    <description>&lt;P&gt;How to use&amp;nbsp; cloudFiles.backfillInterval in our code &amp;amp; also which property we need to set?&lt;/P&gt;</description>
    <pubDate>Fri, 20 Oct 2023 13:15:39 GMT</pubDate>
    <dc:creator>Kiranrathod</dc:creator>
    <dc:date>2023-10-20T13:15:39Z</dc:date>
    <item>
      <title>Databricks Auto Loader cloudFiles.backfillInterval</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/37915#M26509</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been reading databricks Auto Loader documentation about cloudFiles.backfillInterval configuration, and have a question about a specific detail on how it works still.&amp;nbsp; I was only able to find examples of it being set to 1 day or 1 week.&amp;nbsp; So I'm assuming you can enter any time in there such as x hours, x days, x weeks, x months.&amp;nbsp; My question is how does it uses that 1 week to backfill.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does it look at the lastModified time on the files arriving in the&amp;nbsp;&lt;SPAN&gt;input directory that have not been processed and calculates currentTime - lastModified &amp;lt;= backfillInterval.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Or does it run once a week the backfill, so if I ran the databricks autoloader pipeline last week, it will perform a backfill?&amp;nbsp; In that case the backfill might just look through all the files in the input directory and the cloud_file_state and make sure all have been processed?&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I'm not getting a good picture of what exactly backfillInterval does?&amp;nbsp; But it seems to be good, says it guarantees 100% of files to be processed. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2023 05:33:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/37915#M26509</guid>
      <dc:creator>therealchainman</dc:creator>
      <dc:date>2023-07-19T05:33:21Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Auto Loader cloudFiles.backfillInterval</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/37980#M26529</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/84997"&gt;@therealchainman&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The backFillInterval option is provided to make sure eventually all the files are inserted. When you create a new stream, some files might be missed that are not ingested. BackFill is an asynchronous process which is trigerred based on the interval defined by&amp;nbsp;backFillInterval option. This checks for all the files that have been missed and ingests those files&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2023 19:01:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/37980#M26529</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2023-07-19T19:01:48Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Auto Loader cloudFiles.backfillInterval</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/37983#M26531</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/84997"&gt;@therealchainman&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The last backfill (&lt;SPAN&gt;lastBackfillFinishTimeMs)&amp;nbsp;&lt;/SPAN&gt;will be recorded as part of the checkpoint -&amp;gt; offset files, this helps the autoloader to know when the last backfill is triggered and to trigger the next periodic backfill.&lt;/P&gt;&lt;P&gt;Hope this answers your question.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2023 19:14:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/37983#M26531</guid>
      <dc:creator>saipujari_spark</dc:creator>
      <dc:date>2023-07-19T19:14:38Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Auto Loader cloudFiles.backfillInterval</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/49596#M28592</link>
      <description>&lt;P&gt;How to use&amp;nbsp; cloudFiles.backfillInterval in our code &amp;amp; also which property we need to set?&lt;/P&gt;</description>
      <pubDate>Fri, 20 Oct 2023 13:15:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/49596#M28592</guid>
      <dc:creator>Kiranrathod</dc:creator>
      <dc:date>2023-10-20T13:15:39Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Auto Loader cloudFiles.backfillInterval</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/56966#M30698</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/91988"&gt;@Kiranrathod&lt;/a&gt;&amp;nbsp;, you can use the property "&lt;SPAN&gt;cloudFiles.backfillInterval&lt;/SPAN&gt;"&amp;nbsp; to us the backfill. Please refer the doc:&lt;A href="https://docs.databricks.com/en/ingestion/auto-loader/options.html#:~:text=cloudFiles.backfillInterval" target="_blank"&gt;https://docs.databricks.com/en/ingestion/auto-loader/options.html#:~:text=cloudFiles.backfillInterval&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Jan 2024 16:57:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-auto-loader-cloudfiles-backfillinterval/m-p/56966#M30698</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2024-01-11T16:57:47Z</dc:date>
    </item>
  </channel>
</rss>

