<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reprocessing the data with Auto Loader in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reprocessing-the-data-with-auto-loader/m-p/40136#M27136</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/86760"&gt;@Eldar_Dragomir&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use&amp;nbsp;&lt;STRONG&gt;cloudFiles.maxFilesPerTrigger&lt;/STRONG&gt;, to limit the number of files getting processed per micro-batch for maintaining the stability of the pipeline.&lt;/P&gt;</description>
    <pubDate>Thu, 17 Aug 2023 04:13:30 GMT</pubDate>
    <dc:creator>Tharun-Kumar</dc:creator>
    <dc:date>2023-08-17T04:13:30Z</dc:date>
    <item>
      <title>Reprocessing the data with Auto Loader</title>
      <link>https://community.databricks.com/t5/data-engineering/reprocessing-the-data-with-auto-loader/m-p/40127#M27133</link>
      <description>&lt;P&gt;Could you please provide me an idea how I can start reprocessing of my data?&amp;nbsp;&lt;BR /&gt;Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline.&amp;nbsp;&lt;BR /&gt;I want to reprocess the data + continue receive new data.&amp;nbsp;&lt;BR /&gt;What the settings I have to set for that?&lt;BR /&gt;Do I need two "loads" or I can use one with Trigger.AvailableNow with setting of file limitation per batch?&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2023 22:59:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reprocessing-the-data-with-auto-loader/m-p/40127#M27133</guid>
      <dc:creator>Eldar_Dragomir</dc:creator>
      <dc:date>2023-08-16T22:59:38Z</dc:date>
    </item>
    <item>
      <title>Re: Reprocessing the data with Auto Loader</title>
      <link>https://community.databricks.com/t5/data-engineering/reprocessing-the-data-with-auto-loader/m-p/40136#M27136</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/86760"&gt;@Eldar_Dragomir&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use&amp;nbsp;&lt;STRONG&gt;cloudFiles.maxFilesPerTrigger&lt;/STRONG&gt;, to limit the number of files getting processed per micro-batch for maintaining the stability of the pipeline.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2023 04:13:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reprocessing-the-data-with-auto-loader/m-p/40136#M27136</guid>
      <dc:creator>Tharun-Kumar</dc:creator>
      <dc:date>2023-08-17T04:13:30Z</dc:date>
    </item>
  </channel>
</rss>

