<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: When should you use the directory listing vs file notification in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/when-should-you-use-the-directory-listing-vs-file-notification/m-p/5064#M234</link>
    <description>&lt;P&gt;@Bennett Lambert​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;The choice between using "file notification" vs "directory listing" for the autoloader in Delta Live Tables depends on your specific use case and requirements. Here are some general guidelines:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Use file notification if you need real-time ingestion: File notification uses event-based triggers to detect new files in a source storage location, which allows for real-time ingestion as soon as new files are added.&lt;/LI&gt;&lt;LI&gt;Use directory listing if you need to control the ingestion frequency: Directory listing periodically scans the source storage location for new files, which allows you to control the frequency of ingestion. This can be useful if you need to limit the number of ingested files or control the timing of ingestion.&lt;/LI&gt;&lt;LI&gt;Use file notification for small files: File notification is more efficient for small files because it avoids scanning the entire directory for changes.&lt;/LI&gt;&lt;LI&gt;Use directory listing for large files: Directory listing is more efficient for large files because it can reduce the overhead of scanning and processing each file individually.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;In summary, if you need real-time ingestion and have a large number of small files, use file notification. If you need to control the ingestion frequency or have a small number of large files, use directory listing.&lt;/P&gt;</description>
    <pubDate>Sat, 13 May 2023 16:58:07 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2023-05-13T16:58:07Z</dc:date>
    <item>
      <title>When should you use the directory listing vs file notification</title>
      <link>https://community.databricks.com/t5/machine-learning/when-should-you-use-the-directory-listing-vs-file-notification/m-p/5063#M233</link>
      <description>&lt;P&gt;We are using Delta Live Tables for running ingestion pipelines and have come across the two options for the autoloader "file notification" vs "directory listing" this is reflected in the option cloudFiles.useIncrementalListing. We are wondering what the best practices are around which of these to use for the autoloader and when we should use one vs the other?&lt;/P&gt;</description>
      <pubDate>Fri, 28 Apr 2023 08:37:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/when-should-you-use-the-directory-listing-vs-file-notification/m-p/5063#M233</guid>
      <dc:creator>BenLambert</dc:creator>
      <dc:date>2023-04-28T08:37:12Z</dc:date>
    </item>
    <item>
      <title>Re: When should you use the directory listing vs file notification</title>
      <link>https://community.databricks.com/t5/machine-learning/when-should-you-use-the-directory-listing-vs-file-notification/m-p/5064#M234</link>
      <description>&lt;P&gt;@Bennett Lambert​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;The choice between using "file notification" vs "directory listing" for the autoloader in Delta Live Tables depends on your specific use case and requirements. Here are some general guidelines:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Use file notification if you need real-time ingestion: File notification uses event-based triggers to detect new files in a source storage location, which allows for real-time ingestion as soon as new files are added.&lt;/LI&gt;&lt;LI&gt;Use directory listing if you need to control the ingestion frequency: Directory listing periodically scans the source storage location for new files, which allows you to control the frequency of ingestion. This can be useful if you need to limit the number of ingested files or control the timing of ingestion.&lt;/LI&gt;&lt;LI&gt;Use file notification for small files: File notification is more efficient for small files because it avoids scanning the entire directory for changes.&lt;/LI&gt;&lt;LI&gt;Use directory listing for large files: Directory listing is more efficient for large files because it can reduce the overhead of scanning and processing each file individually.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;In summary, if you need real-time ingestion and have a large number of small files, use file notification. If you need to control the ingestion frequency or have a small number of large files, use directory listing.&lt;/P&gt;</description>
      <pubDate>Sat, 13 May 2023 16:58:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/when-should-you-use-the-directory-listing-vs-file-notification/m-p/5064#M234</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-05-13T16:58:07Z</dc:date>
    </item>
  </channel>
</rss>

