<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: autoloader data processing in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/autoloader-data-processing/m-p/83538#M3637</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36892"&gt;@Phani1&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Structure of folders that you are going to use make sense to me. Since you've mentioned that there will be thousands of files, the best practice will be to use autoloader with file notification mode.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, you can read about databricks recommendations:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/file-notification-mode" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/file-notification-mode&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html" target="_blank"&gt;https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 20 Aug 2024 06:04:57 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2024-08-20T06:04:57Z</dc:date>
    <item>
      <title>autoloader data processing</title>
      <link>https://community.databricks.com/t5/get-started-discussions/autoloader-data-processing/m-p/83534#M3636</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;Can you share the best practices for designing the autoloader data processing?&lt;/P&gt;&lt;P&gt;We have data from 30 countries data coming in various files. Currently, we are thinking of using a root folder i.e country, and with subfolders for the individual countries.&lt;/P&gt;&lt;P&gt;In the autoloader script, we plan to set the path to the root folder. Is this a good method? Please advise on the best way to handle thousands of files.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Phani&lt;/P&gt;</description>
      <pubDate>Tue, 20 Aug 2024 05:42:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/autoloader-data-processing/m-p/83534#M3636</guid>
      <dc:creator>Phani1</dc:creator>
      <dc:date>2024-08-20T05:42:30Z</dc:date>
    </item>
    <item>
      <title>Re: autoloader data processing</title>
      <link>https://community.databricks.com/t5/get-started-discussions/autoloader-data-processing/m-p/83538#M3637</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36892"&gt;@Phani1&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Structure of folders that you are going to use make sense to me. Since you've mentioned that there will be thousands of files, the best practice will be to use autoloader with file notification mode.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, you can read about databricks recommendations:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/file-notification-mode" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/file-notification-mode&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html" target="_blank"&gt;https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Aug 2024 06:04:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/autoloader-data-processing/m-p/83538#M3637</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-08-20T06:04:57Z</dc:date>
    </item>
  </channel>
</rss>

