<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Shoud data in Raw /Bronze be in Catalog? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/shoud-data-in-raw-bronze-be-in-catalog/m-p/108668#M43111</link>
    <description>&lt;P&gt;No, leave the raw out of the catalog.&amp;nbsp; I'd recommend doing something a little different.&lt;/P&gt;&lt;P&gt;Consider the zip files as "landed", not "raw".&amp;nbsp; Consider the raw data is the unzipped data.&lt;/P&gt;&lt;P&gt;In your schema in the bronze layer, configure an external location for the raw data.&amp;nbsp; In that same schema, make tables (can be DLT or just regular Delta Tables, depends on your needs) and load the raw data from the external location into the bronze tables.&amp;nbsp; No modification, just flatten the data.&amp;nbsp; This is your working bronze layer.&amp;nbsp; This will take a little extra work up front, but tables will give you much better performance than working directly with the raw json/csv/xml/whatever, and will also give you access with permissions governance over the raw data.&amp;nbsp; Then do your bronze&amp;gt;&amp;gt;silver and silver&amp;gt;&amp;gt;gold as usual.&lt;/P&gt;</description>
    <pubDate>Mon, 03 Feb 2025 22:16:01 GMT</pubDate>
    <dc:creator>Rjdudley</dc:creator>
    <dc:date>2025-02-03T22:16:01Z</dc:date>
    <item>
      <title>Shoud data in Raw /Bronze be in Catalog?</title>
      <link>https://community.databricks.com/t5/data-engineering/shoud-data-in-raw-bronze-be-in-catalog/m-p/108666#M43109</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;What are the benefits of not "registering" Raw data into Unity Catalog when the data in Raw will be in its original format, such as .csv, .json, .parquet, etc?&lt;/P&gt;&lt;P&gt;An example scenario could be:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Data arrives at Landing as .zip;&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;The zip will be verified for correctness, and saved to Raw as-is, in a pre-defined folder structure.&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;Unity Catalog will not know these files.&lt;/LI&gt;&lt;LI&gt;The next layer (Silver) will be in Catalog&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I thought that the data in Raw should be in Catalog for various benefits (as stated in the documentation). But what would be the benefits of not adding them to Catalog?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;N.Z.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 21:55:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shoud-data-in-raw-bronze-be-in-catalog/m-p/108666#M43109</guid>
      <dc:creator>cdn_yyz_yul</dc:creator>
      <dc:date>2025-02-03T21:55:24Z</dc:date>
    </item>
    <item>
      <title>Re: Shoud data in Raw /Bronze be in Catalog?</title>
      <link>https://community.databricks.com/t5/data-engineering/shoud-data-in-raw-bronze-be-in-catalog/m-p/108668#M43111</link>
      <description>&lt;P&gt;No, leave the raw out of the catalog.&amp;nbsp; I'd recommend doing something a little different.&lt;/P&gt;&lt;P&gt;Consider the zip files as "landed", not "raw".&amp;nbsp; Consider the raw data is the unzipped data.&lt;/P&gt;&lt;P&gt;In your schema in the bronze layer, configure an external location for the raw data.&amp;nbsp; In that same schema, make tables (can be DLT or just regular Delta Tables, depends on your needs) and load the raw data from the external location into the bronze tables.&amp;nbsp; No modification, just flatten the data.&amp;nbsp; This is your working bronze layer.&amp;nbsp; This will take a little extra work up front, but tables will give you much better performance than working directly with the raw json/csv/xml/whatever, and will also give you access with permissions governance over the raw data.&amp;nbsp; Then do your bronze&amp;gt;&amp;gt;silver and silver&amp;gt;&amp;gt;gold as usual.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 22:16:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shoud-data-in-raw-bronze-be-in-catalog/m-p/108668#M43111</guid>
      <dc:creator>Rjdudley</dc:creator>
      <dc:date>2025-02-03T22:16:01Z</dc:date>
    </item>
    <item>
      <title>Re: Shoud data in Raw /Bronze be in Catalog?</title>
      <link>https://community.databricks.com/t5/data-engineering/shoud-data-in-raw-bronze-be-in-catalog/m-p/108670#M43112</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/107723"&gt;@Rjdudley&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I meant to say, the scenario is:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Data arrives at Landing as .zip;&amp;nbsp; &amp;nbsp;&lt;/LI&gt;&lt;LI&gt;The zip will be verified for correctness, and &lt;STRONG&gt;then unzipped,&lt;/STRONG&gt;&amp;nbsp;the &lt;STRONG&gt;extracted files&lt;/STRONG&gt; will be saved to Raw as-is, in a pre-defined folder structure.&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;Unity Catalog will not know these files.&lt;/LI&gt;&lt;LI&gt;The next layer (Silver) will be in Catalog&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I agree with your reply, myself. But I had received a proposal which suggested a scenario of not "cataloging" Raw, instead, using another tool to achieve the need of searching files in Raw.&amp;nbsp;&lt;BR /&gt;I would like to understand if there are benefits in doing so,&amp;nbsp; from the community.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 22:35:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shoud-data-in-raw-bronze-be-in-catalog/m-p/108670#M43112</guid>
      <dc:creator>cdn_yyz_yul</dc:creator>
      <dc:date>2025-02-03T22:35:56Z</dc:date>
    </item>
    <item>
      <title>Re: Shoud data in Raw /Bronze be in Catalog?</title>
      <link>https://community.databricks.com/t5/data-engineering/shoud-data-in-raw-bronze-be-in-catalog/m-p/108802#M43147</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/145837"&gt;@cdn_yyz_yul&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;P&gt;But I had received a proposal which suggested a scenario of not "cataloging" Raw, instead, using another tool to achieve the need of searching files in Raw.&amp;nbsp;&lt;BR /&gt;I would like to understand if there are benefits in doing so,&amp;nbsp; from the community.&amp;nbsp;&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;This is where "it depends" on what your company's setup is, but maybe I can provide some food for thought.&amp;nbsp; Do you already have this other tool, or is it a new purchase?&amp;nbsp; Did this proposal come from a tool vendor or a consultant, or from your CTO?&amp;nbsp; Do you have an ongoing need to search the raw files which would require an additional tool?&amp;nbsp; What business capabilities is this tool going to fulfill--just cataloging and searching, or is it a governance tool like Atlan/Alation/Collibra?&lt;/P&gt;&lt;P&gt;Under most use cases you would not need an additional tool to search raw files.&amp;nbsp; You'd either transform the data or create table metadata in Unity Catalog from the files and work with the files directly.&amp;nbsp; The advantage to Unity Catalog is you have all of the same security settings and data classifications, a familiar UI, and only one thing to administer.&lt;/P&gt;&lt;P&gt;We're in the second year of our Databricks implementation.&amp;nbsp; My approach has been to wait and see if an actual need arises, and if Databricks doesn't come out with a feature which solves my need.&amp;nbsp; We saw some really nice tools at Data+AI Summit, shiny new things are easy to get excited about, but I always assess the actual business need and any lack of features before I expand.&lt;/P&gt;&lt;P&gt;If it's not abundantly clear why you need this extra tooling, ask back "what business capabilities we are realizing, or what features are we lacking".&amp;nbsp; If the features of the tool overlap features in Databricks, my experience in using Databricks has been positive largely because of the integration and simple management.&amp;nbsp; Hope that helps.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Feb 2025 14:38:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shoud-data-in-raw-bronze-be-in-catalog/m-p/108802#M43147</guid>
      <dc:creator>Rjdudley</dc:creator>
      <dc:date>2025-02-04T14:38:27Z</dc:date>
    </item>
  </channel>
</rss>

