<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pros and cons of physically separating data in different storage accounts and containers in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/pros-and-cons-of-physically-separating-data-in-different-storage/m-p/64353#M6854</link>
    <description>&lt;P class="p1"&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/92609"&gt;@pernilak&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P class="p1"&gt;Thanks for reaching out to Databricks Community! My name is Raphael, and I'll be helping out.&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should all catalogs and the metastore reside in the same storage account (but different containers)&lt;/FONT&gt;&lt;/P&gt;
&lt;DIV id="tinyMceEditor_37c1f7baeccd6braphaelblg_0" class="mceNonEditable lia-copypaste-placeholder"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="raphaelblg_1-1711062085475.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6752iA0B9079DE6E9ED06/image-size/medium?v=v2&amp;amp;px=400" role="button" title="raphaelblg_1-1711062085475.png" alt="raphaelblg_1-1711062085475.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Yes, Databricks recommends having one separate storage location (container) per catalog. But you can also have one single container for the whole metastore (metastore-level storage). If you need to isolate your data at infrastructure level (i.e separate storage accounts) then the best practice is to use &lt;A title="External Locations" href="https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html" target="_self"&gt;External Locations&lt;/A&gt;&amp;nbsp;but you can't create a whole catalog in an external location, only other small entities such as tables or volumes.&lt;/P&gt;
&lt;P&gt;For information to help you decide whether you need metastore-level storage, see&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/get-started#metastore-storage" target="_blank" rel="noopener" data-linktype="relative-path"&gt;(Optional) Create metastore-level storage&lt;/A&gt;&amp;nbsp;and&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/best-practices#physically-separate" target="_blank" rel="noopener" data-linktype="relative-path"&gt;Data is physically separated in storage&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should the metastore have one storage account and other catalogs reside in a different one (separate containers)&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;Metastore and catalogs should reside in the same storage account, you can have one container per catalog or one container for all metastore entities, it's up to you to decide. My answer for your question no.1 has the auxiliar doc urls that should help you understand which option is better to you.&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should dev, test and prod catalogs be in different storage accounts?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;I don't think that it's possible, if you want to work with separate storage accounts then you should use&amp;nbsp;&lt;A title="External Locations" href="https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html" target="_self"&gt;External Locations.&lt;/A&gt;&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should one domain (we have catalogs based on domain) be in one storage account, but then have dev, test and prod catalogs in different containers?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;This is a good pattern.&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should data be separated based on the requirements for retention and backup?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;Not necessarily, but it can be done. With UC, data retention and backup will mostly rely on your cloud storage retention policies/backup policies. Databricks itself allows for table-level short-term backups (&lt;A href="https://docs.databricks.com/en/delta/history.html)" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/delta/history.html)&lt;/A&gt;&amp;nbsp;while also &lt;STRONG&gt;always&lt;/STRONG&gt; respecting the cloud storage policies.&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Or should we separate data on schemas (different containers or storage accounts?)?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;This is a good pattern but you must use a single storage account for storing these schemas.&amp;nbsp;&lt;/P&gt;
&lt;OL class="ol1"&gt;
&lt;LI class="li1"&gt;If a location has been provided for&amp;nbsp;mySchema, it will be stored there.&lt;/LI&gt;
&lt;LI class="li1"&gt;If not, and a location has been provided on&amp;nbsp;myCatalog, it will be stored there.&lt;/LI&gt;
&lt;LI class="li1"&gt;Finally, if no location has been provided on&amp;nbsp;myCatalog, it will be stored in the location associated with the&amp;nbsp;my-region-metastore.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should some schemas not reside in the same storage account as the catalog?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;I don't think that it's possible, but you can have external tables within another storage accounts stored under one of your catalog's schemas. You can also have external volumes (also in separate storage accounts) for storing/fetching files in your Unity Catalog.&lt;BR /&gt;&lt;BR /&gt;Final observations:&lt;/P&gt;
&lt;P class="p1"&gt;Let's say you have a storage account no.1 and no.2. Then you choose no.1 to create your metastore and you create your dev catalog there.&lt;/P&gt;
&lt;P class="p1"&gt;But, you do have some tables in storage account no.2 that you'd like to use in your UC dev catalog stored in storage account no.1.&lt;/P&gt;
&lt;P class="p1"&gt;If this is the case, then you'll be creating an external table on your dev catalog pointing to storage account no.2. But what happens with your data and metadata?&lt;/P&gt;
&lt;P class="p1"&gt;Data -&amp;gt; Stored under storage account no.2&lt;/P&gt;
&lt;P class="p1"&gt;Metadata -&amp;gt; Stored under storage account no.1 (dev catalog in this example)&lt;/P&gt;
&lt;P&gt;Feel free to ask any further questions, if my response addresses your concerns then please mark it as the official solution &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Thu, 21 Mar 2024 23:41:52 GMT</pubDate>
    <dc:creator>raphaelblg</dc:creator>
    <dc:date>2024-03-21T23:41:52Z</dc:date>
    <item>
      <title>Pros and cons of physically separating data in different storage accounts and containers</title>
      <link>https://community.databricks.com/t5/get-started-discussions/pros-and-cons-of-physically-separating-data-in-different-storage/m-p/62840#M6852</link>
      <description>&lt;P&gt;When setting up Unity Catalog, it is recommended by Databricks to figure out your data isolation model when it comes to physically separating your data into different storage accounts and/or contaners. There are so many options, it can be hard to be confident in the solution you choose. Some alternatives we are looking into are:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Should all catalogs and the metastore reside in the same storage account (but different containers)&lt;/LI&gt;&lt;LI&gt;Should the metastore have one storage account and other catalogs reside in a different one (separate containers)&lt;/LI&gt;&lt;LI&gt;Should dev, test and prod catalogs be in different storage accounts?&lt;/LI&gt;&lt;LI&gt;Should one domain (we have catalogs based on domain) be in one storage account, but then have dev, test and prod catalogs in different containers?&lt;/LI&gt;&lt;LI&gt;Should data be separated based on the requirements for retention and backup?&lt;/LI&gt;&lt;LI&gt;Or should we separate data on schemas (different containers or storage accounts?)?&lt;/LI&gt;&lt;LI&gt;Should some schemas not reside in the same storage account as the catalog?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;What are your thoughts on this subject. What are the pros and cons of the different methods based on your experience?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Mar 2024 09:28:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/pros-and-cons-of-physically-separating-data-in-different-storage/m-p/62840#M6852</guid>
      <dc:creator>pernilak</dc:creator>
      <dc:date>2024-03-07T09:28:09Z</dc:date>
    </item>
    <item>
      <title>Re: Pros and cons of physically separating data in different storage accounts and containers</title>
      <link>https://community.databricks.com/t5/get-started-discussions/pros-and-cons-of-physically-separating-data-in-different-storage/m-p/62889#M6853</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;i think there is no simple answer and all depends on use case, i can try to give you some hints I follow:&lt;/P&gt;&lt;P&gt;1) &lt;STRONG&gt;S&lt;SPAN&gt;hould the metastore have one storage account and other catalogs reside in a different one (separate containers)&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&lt;BR /&gt;Avoid Metastore central storage. It is no longer required and it is creating architectural mess. Focus on assigning default storage location at leas on each Catalog. Multiple catalogs can have same storage associated with it.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2) Look at possible Storage Account limits - if you have really big system and if you try to put all data in one Storage you can face&amp;nbsp;&lt;STRONG&gt;Request limits&lt;/STRONG&gt; and &lt;STRONG&gt;throttling.&lt;/STRONG&gt;&lt;/SPAN&gt; E.g. your jobs or queries can stuck on those limits.&amp;nbsp;&lt;BR /&gt;Make sure you distribute workload across many Storages, there is no additional "fee" for having multiple storage accounts ... but ...&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3) If you plan to use private endpoint - don't create too many Storage Accounts use separation on Containers. Private Endpoint cost you ~8 USD each month and if you place too many Storage Accounts, you will suddenly pay a lot for idle Private Endpoints.&lt;/P&gt;&lt;P&gt;4) Make it easy to manage - I find some architectural concept easier to manage then other. E.g. for data archiving I am making table Clone. Clone always lands to Catalog with suffix _archive. Those Catalogs have separate storage, where i put Storage Policy, to move data to Cool and/or Archive tier. I apply this policy to entire Storage. Just try to make it easy for you.&lt;/P&gt;&lt;P&gt;5) External Location - this can be your only separator for Env / Department when you don't have any strict security requirements.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;6) Cost Management - imagine you have multiple divisions. If each division need to be corss-charged for data (read, write and storage) I find it super easy to create separate storage for each division and charge them for any cost associated with this Storage.&amp;nbsp;&lt;BR /&gt;If you don't do this - it is really hard to make this calculation e.g. calculating each table data file sizes .&lt;/P&gt;&lt;P&gt;7) Environment separation - separate environments. Small project without restrictions - i would separate on Container level. Bigger projects, more restriction - separation on Storage level (then I put storages on separate subscription and VNETs).&lt;BR /&gt;Remember if you create like 100 Storages and 10 Databricks Workspaces you might have administration headache allowing Cluster Subnet to reach your storages, that will create additional layer when divisions would like to share data between each other.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt; Regionalization requirement - this will basically mean you have to create separate workspace and storage in dedicated region (maybe even metastore) and map certain level Catalog / Schema to this storage&lt;/P&gt;&lt;P&gt;9) Schema Level - I try to design my Metastore(s) in way that i am not putting schemas to different Storage Accounts. Still I am assigning separate default location to /&amp;lt;container&amp;gt;/&amp;lt;schema_name&amp;gt;/ storage path.&lt;BR /&gt;But this is because i separate e.g. division on catalog level, if you would come up with idea of separating division on schema level, this would be ok to separate storage on schema level.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Mar 2024 11:46:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/pros-and-cons-of-physically-separating-data-in-different-storage/m-p/62889#M6853</guid>
      <dc:creator>Wojciech_BUK</dc:creator>
      <dc:date>2024-03-07T11:46:35Z</dc:date>
    </item>
    <item>
      <title>Re: Pros and cons of physically separating data in different storage accounts and containers</title>
      <link>https://community.databricks.com/t5/get-started-discussions/pros-and-cons-of-physically-separating-data-in-different-storage/m-p/64353#M6854</link>
      <description>&lt;P class="p1"&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/92609"&gt;@pernilak&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P class="p1"&gt;Thanks for reaching out to Databricks Community! My name is Raphael, and I'll be helping out.&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should all catalogs and the metastore reside in the same storage account (but different containers)&lt;/FONT&gt;&lt;/P&gt;
&lt;DIV id="tinyMceEditor_37c1f7baeccd6braphaelblg_0" class="mceNonEditable lia-copypaste-placeholder"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="raphaelblg_1-1711062085475.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6752iA0B9079DE6E9ED06/image-size/medium?v=v2&amp;amp;px=400" role="button" title="raphaelblg_1-1711062085475.png" alt="raphaelblg_1-1711062085475.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Yes, Databricks recommends having one separate storage location (container) per catalog. But you can also have one single container for the whole metastore (metastore-level storage). If you need to isolate your data at infrastructure level (i.e separate storage accounts) then the best practice is to use &lt;A title="External Locations" href="https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html" target="_self"&gt;External Locations&lt;/A&gt;&amp;nbsp;but you can't create a whole catalog in an external location, only other small entities such as tables or volumes.&lt;/P&gt;
&lt;P&gt;For information to help you decide whether you need metastore-level storage, see&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/get-started#metastore-storage" target="_blank" rel="noopener" data-linktype="relative-path"&gt;(Optional) Create metastore-level storage&lt;/A&gt;&amp;nbsp;and&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/best-practices#physically-separate" target="_blank" rel="noopener" data-linktype="relative-path"&gt;Data is physically separated in storage&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should the metastore have one storage account and other catalogs reside in a different one (separate containers)&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;Metastore and catalogs should reside in the same storage account, you can have one container per catalog or one container for all metastore entities, it's up to you to decide. My answer for your question no.1 has the auxiliar doc urls that should help you understand which option is better to you.&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should dev, test and prod catalogs be in different storage accounts?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;I don't think that it's possible, if you want to work with separate storage accounts then you should use&amp;nbsp;&lt;A title="External Locations" href="https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html" target="_self"&gt;External Locations.&lt;/A&gt;&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should one domain (we have catalogs based on domain) be in one storage account, but then have dev, test and prod catalogs in different containers?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;This is a good pattern.&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should data be separated based on the requirements for retention and backup?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;Not necessarily, but it can be done. With UC, data retention and backup will mostly rely on your cloud storage retention policies/backup policies. Databricks itself allows for table-level short-term backups (&lt;A href="https://docs.databricks.com/en/delta/history.html)" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/delta/history.html)&lt;/A&gt;&amp;nbsp;while also &lt;STRONG&gt;always&lt;/STRONG&gt; respecting the cloud storage policies.&lt;/P&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Or should we separate data on schemas (different containers or storage accounts?)?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;This is a good pattern but you must use a single storage account for storing these schemas.&amp;nbsp;&lt;/P&gt;
&lt;OL class="ol1"&gt;
&lt;LI class="li1"&gt;If a location has been provided for&amp;nbsp;mySchema, it will be stored there.&lt;/LI&gt;
&lt;LI class="li1"&gt;If not, and a location has been provided on&amp;nbsp;myCatalog, it will be stored there.&lt;/LI&gt;
&lt;LI class="li1"&gt;Finally, if no location has been provided on&amp;nbsp;myCatalog, it will be stored in the location associated with the&amp;nbsp;my-region-metastore.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="p1 lia-indent-padding-left-30px"&gt;&lt;FONT color="#808080"&gt;Should some schemas not reside in the same storage account as the catalog?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="p1"&gt;I don't think that it's possible, but you can have external tables within another storage accounts stored under one of your catalog's schemas. You can also have external volumes (also in separate storage accounts) for storing/fetching files in your Unity Catalog.&lt;BR /&gt;&lt;BR /&gt;Final observations:&lt;/P&gt;
&lt;P class="p1"&gt;Let's say you have a storage account no.1 and no.2. Then you choose no.1 to create your metastore and you create your dev catalog there.&lt;/P&gt;
&lt;P class="p1"&gt;But, you do have some tables in storage account no.2 that you'd like to use in your UC dev catalog stored in storage account no.1.&lt;/P&gt;
&lt;P class="p1"&gt;If this is the case, then you'll be creating an external table on your dev catalog pointing to storage account no.2. But what happens with your data and metadata?&lt;/P&gt;
&lt;P class="p1"&gt;Data -&amp;gt; Stored under storage account no.2&lt;/P&gt;
&lt;P class="p1"&gt;Metadata -&amp;gt; Stored under storage account no.1 (dev catalog in this example)&lt;/P&gt;
&lt;P&gt;Feel free to ask any further questions, if my response addresses your concerns then please mark it as the official solution &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 21 Mar 2024 23:41:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/pros-and-cons-of-physically-separating-data-in-different-storage/m-p/64353#M6854</guid>
      <dc:creator>raphaelblg</dc:creator>
      <dc:date>2024-03-21T23:41:52Z</dc:date>
    </item>
  </channel>
</rss>

