<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Auto Loader Use Case Question - Centralized Dropzone to Bronze? in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/auto-loader-use-case-question-centralized-dropzone-to-bronze/m-p/55451#M2011</link>
    <description>&lt;P&gt;Quick follow-up on this&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;(or to anyone else in the Databricks multi-verse who is able to help clarify this case).&lt;/P&gt;&lt;P&gt;I understand that the proposed solution would work for a "&lt;STRONG&gt;one-to-one&lt;/STRONG&gt;" case where many files are landing in a specific dbfs path to be ingested by auto loader into a specific bronze table. OR another similar case where we have "&lt;STRONG&gt;many-to-one&lt;/STRONG&gt;" where we are expecting different data from different sources going into a single bronze table, but critically all the different sourced data would have the same format (i.e. all different source data have the same column length/type/order and it's cleaned up in a single bronze raw location).&lt;/P&gt;&lt;P&gt;In my case, what I'm asking about is if auto loader supports a "&lt;STRONG&gt;many-to-many&lt;/STRONG&gt;" case (i.e. "&lt;STRONG&gt;&lt;FONT color="#339966"&gt;many source database_table.csv&lt;/FONT&gt;-to-&lt;FONT color="#FF0000"&gt;many target database_table_bronze&lt;/FONT&gt;&lt;/STRONG&gt;"), or else what is the suggested Databricks approach to this? In this context, I mean that we have many source locations with many different kinds of tables being loaded into a single "dropzone" location. From this single "dropzone" location, I would expect that either auto loader should natively support my use-case, or if not that there should be a fairly standard approach to dealing with this.&lt;/P&gt;&lt;P&gt;I'm not the only person in the world dealing with this, and I found a similar example of someone else posting this on Stack Overflow:&amp;nbsp;&lt;A href="https://stackoverflow.com/questions/69572265/ingest-several-types-of-csvs-with-databricks-auto-loader" target="_blank"&gt;https://stackoverflow.com/questions/69572265/ingest-several-types-of-csvs-with-databricks-auto-loader&lt;/A&gt;. The proposed solution for them was to use "&lt;STRONG&gt;pathGlobFilter&lt;/STRONG&gt;", however I believe this only handles the first two cases and not the "many-to-many" case I'm posting here.&lt;/P&gt;&lt;P&gt;For additional context, here's the code I have so far below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ChristianRRL_1-1702920099476.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/5631iCF9BCB77B8838BFF/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="ChristianRRL_1-1702920099476.png" alt="ChristianRRL_1-1702920099476.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 18 Dec 2023 17:37:27 GMT</pubDate>
    <dc:creator>ChristianRRL</dc:creator>
    <dc:date>2023-12-18T17:37:27Z</dc:date>
    <item>
      <title>Auto Loader Use Case Question - Centralized Dropzone to Bronze?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/auto-loader-use-case-question-centralized-dropzone-to-bronze/m-p/55202#M1976</link>
      <description>&lt;P&gt;Good day,&lt;/P&gt;&lt;P&gt;I am trying to use Auto Loader (potentially extending into DLT in the future) to easily pull data coming from an external system (currently located in a single location) and organize it and load it respectively. I am struggling quite a bit at the moment and I would really appreciate some feedback. Please let me know if any of my assumptions or approach towards this is not correct.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Here is an idealized version of what we want and are intending to do:&lt;/STRONG&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;We are expecting data from multiple different kinds of sites/databases/tables to land in a &lt;STRONG&gt;single&lt;/STRONG&gt; "&lt;STRONG&gt;dropzone&lt;/STRONG&gt;" in our data lake&lt;/LI&gt;&lt;LI&gt;As soon as data files land in the dropzone, we would like for the data to be &lt;STRONG&gt;sorted&lt;/STRONG&gt; into it's respective landing location (aka: bronze or raw location) &lt;STRONG&gt;organized appropriately by the kind of data it is&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;Once in the landing or bronze location, we should be able to appropriately use &lt;STRONG&gt;Auto Loader (and/or DLT)&lt;/STRONG&gt; easily process our data in accordance to the "Medallion Architecture" model&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The struggle we have currently is that once the data lands in the "dropzone", I can't see an effective/lean way to programmatically/automatically move the data from the dropzone to it's intended location. I've tried looking into data moving methods, but I am not able to find anything that is as straightforward as I thought this should be.&lt;/P&gt;&lt;P&gt;I thought that Auto Loader should be an easy way to pull data from various databases that is located in a centralized location. However, the more I look into it the more it seems like Auto Loader just "assumes" that the data will be located in the "landing" location (aka: bronze, aka: raw). If this is the case, the value of Auto Loader to me is greatly diminished.&lt;/P&gt;&lt;P&gt;Guidance and feedback on this would be *greatly* appreciated!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is a brief "sample" of what we're thinking currently:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;# dropzone template&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../&amp;lt;data_source_name&amp;gt;/dropzone/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;---&lt;BR /&gt;# dropzone example&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_A.Schema.Table_A.site_number_1.oem_shortname.timestamp.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_A.Schema.Table_B.site_number_1.oem_shortname.timestamp.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_A.Schema.Table_C.site_number_1.oem_shortname.timestamp.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_A.Schema.Table_A.site_number_2.oem_shortname.timestamp.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_A.Schema.Table_B.site_number_2.oem_shortname.timestamp.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_A.Schema.Table_C.site_number_2.oem_shortname.timestamp.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ...&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_X.Schema.Table_A.site_number_x.oem_shortname.timestamp.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_X.Schema.Table_B.site_number_x.oem_shortname.timestamp.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_X.Schema.Table_C.site_number_x.oem_shortname.timestamp.csv&lt;BR /&gt;---&lt;BR /&gt;# bronze template&lt;BR /&gt;.../&amp;lt;data_source_name&amp;gt;/&amp;lt;category&amp;gt;/bronze/&amp;lt;oem_shortname&amp;gt;/&amp;lt;site_number&amp;gt;/&amp;lt;linted_database&amp;gt;/&amp;lt;linted_table_name&amp;gt;/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;---&lt;BR /&gt;# bronze example&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .../source_x/electrical/bronze/ge/1/database_a/table_a/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&lt;/SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/electrical/bronze/ge/1/database_a/table_b/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/electrical/bronze/ge/1/database_a/table_c/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/electrical/bronze/siemens/2/database_b/table_a/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/electrical/bronze/siemens/2/database_b/table_b/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/electrical/bronze/siemens/2/database_b/table_c/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/mechanical/bronze/ge/3/database_c/table_a/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/mechanical/bronze/ge/3/database_c/table_b/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/mechanical/bronze/ge/3/database_c/Table_C/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp; &amp;nbsp;.../source_x/mechanical/bronze/siemens/4/database_d/table_a/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/mechanical/bronze/siemens/4/database_d/table_c/&amp;lt;full_file_name&amp;gt;.csv&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/mechanical/bronze/siemens/4/database_d/table_d/&amp;lt;full_file_name&amp;gt;.csv&lt;/P&gt;</description>
      <pubDate>Wed, 13 Dec 2023 17:48:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/auto-loader-use-case-question-centralized-dropzone-to-bronze/m-p/55202#M1976</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2023-12-13T17:48:55Z</dc:date>
    </item>
    <item>
      <title>Re: Auto Loader Use Case Question - Centralized Dropzone to Bronze?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/auto-loader-use-case-question-centralized-dropzone-to-bronze/m-p/55265#M1989</link>
      <description>&lt;P&gt;I've actually looked through those sources somewhat extensively, but I'm still having some confusion about whether Auto Loader is the right solution. What I was hoping is that data can land in the "dropzone", but afterwards it is *moved* to the correct corresponding path as I showed earlier. For example (for a single file):&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;# dropzone example&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; .../source_x/dropzone/Database_A.Schema.Table_A.site_number_1.oem_shortname.timestamp.csv&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;# bronze example&lt;BR /&gt;&amp;nbsp; &amp;nbsp; .../source_x/electrical/bronze/ge/1/database_a/table_a/&amp;lt;full_file_name&amp;gt;.csv&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;However, Auto Loader seems to be more about highlighting one specific path. The issue with that is if I select the dropzone as my bronze-level path, there is a mix of many different raw csv files related to different tables and from different databases at times too. So does Auto Loader specifically support *moving* files from one centralized dbfs location (dropzone in this case) to the respective bronze delta table locations?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Dec 2023 16:44:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/auto-loader-use-case-question-centralized-dropzone-to-bronze/m-p/55265#M1989</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2023-12-14T16:44:13Z</dc:date>
    </item>
    <item>
      <title>Re: Auto Loader Use Case Question - Centralized Dropzone to Bronze?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/auto-loader-use-case-question-centralized-dropzone-to-bronze/m-p/55451#M2011</link>
      <description>&lt;P&gt;Quick follow-up on this&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;(or to anyone else in the Databricks multi-verse who is able to help clarify this case).&lt;/P&gt;&lt;P&gt;I understand that the proposed solution would work for a "&lt;STRONG&gt;one-to-one&lt;/STRONG&gt;" case where many files are landing in a specific dbfs path to be ingested by auto loader into a specific bronze table. OR another similar case where we have "&lt;STRONG&gt;many-to-one&lt;/STRONG&gt;" where we are expecting different data from different sources going into a single bronze table, but critically all the different sourced data would have the same format (i.e. all different source data have the same column length/type/order and it's cleaned up in a single bronze raw location).&lt;/P&gt;&lt;P&gt;In my case, what I'm asking about is if auto loader supports a "&lt;STRONG&gt;many-to-many&lt;/STRONG&gt;" case (i.e. "&lt;STRONG&gt;&lt;FONT color="#339966"&gt;many source database_table.csv&lt;/FONT&gt;-to-&lt;FONT color="#FF0000"&gt;many target database_table_bronze&lt;/FONT&gt;&lt;/STRONG&gt;"), or else what is the suggested Databricks approach to this? In this context, I mean that we have many source locations with many different kinds of tables being loaded into a single "dropzone" location. From this single "dropzone" location, I would expect that either auto loader should natively support my use-case, or if not that there should be a fairly standard approach to dealing with this.&lt;/P&gt;&lt;P&gt;I'm not the only person in the world dealing with this, and I found a similar example of someone else posting this on Stack Overflow:&amp;nbsp;&lt;A href="https://stackoverflow.com/questions/69572265/ingest-several-types-of-csvs-with-databricks-auto-loader" target="_blank"&gt;https://stackoverflow.com/questions/69572265/ingest-several-types-of-csvs-with-databricks-auto-loader&lt;/A&gt;. The proposed solution for them was to use "&lt;STRONG&gt;pathGlobFilter&lt;/STRONG&gt;", however I believe this only handles the first two cases and not the "many-to-many" case I'm posting here.&lt;/P&gt;&lt;P&gt;For additional context, here's the code I have so far below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ChristianRRL_1-1702920099476.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/5631iCF9BCB77B8838BFF/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="ChristianRRL_1-1702920099476.png" alt="ChristianRRL_1-1702920099476.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Dec 2023 17:37:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/auto-loader-use-case-question-centralized-dropzone-to-bronze/m-p/55451#M2011</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2023-12-18T17:37:27Z</dc:date>
    </item>
  </channel>
</rss>

