<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic unity catalog guidelines for internal/external tables and multiple workspaces in Data Governance</title>
    <link>https://community.databricks.com/t5/data-governance/unity-catalog-guidelines-for-internal-external-tables-and/m-p/55883#M1509</link>
    <description>&lt;P&gt;We're setting up from scratch the Unity Catalog in our infrastructure in Azure, that is both&lt;BR /&gt;- multi region ( europe, us)&lt;BR /&gt;- multi env (dev, qa, prod)&lt;BR /&gt;&lt;BR /&gt;So, we did setup 2 metastore, one for each region, one in west europe and one for south central us.&lt;BR /&gt;So far, so good.&lt;BR /&gt;&lt;BR /&gt;Now I have a doubt on how to integrate real data with it.&lt;BR /&gt;In a separate product we had before the unity catalog was out we had:&lt;BR /&gt;- separate ADLS storages for region, environment (So 1 ADLS for DEV, 1 for QA and so on).&amp;nbsp;&lt;BR /&gt;- separate Databricks workspace (1 for DEV, 1 for QA and so on).&lt;BR /&gt;&lt;BR /&gt;So my first approach would be to bind all these brand new ADLS as external locations.&lt;BR /&gt;So I'd have to only register catalogs, schemas, tables and volumes in the mestastores, that would contain the metadata only, and have the real data elsewhere.&lt;BR /&gt;Besides Databricks this way would not "own" the data, would I have all features available for the unity catalog as for the internal tables?&lt;BR /&gt;&lt;BR /&gt;If I wanted instead to use the internal storage, that is bound to the metastore backed ADLS, I assume I'd have to integrate in the same store DEV, QA and PROD data.&lt;BR /&gt;&lt;BR /&gt;So here are the questions:&lt;BR /&gt;&lt;BR /&gt;- what is the suggested way to proceed with naming conventions? Is it about adding a "DEV, QA, PROD" suffix to catalogs/schemas to distinguish them?&lt;BR /&gt;- how about granting access on the different&amp;nbsp; DEV, QA and PRODcatalogs and schemas for different workspaces ?&lt;BR /&gt;&amp;nbsp;There is a way to grant access to workspace level of do I need to create users and groups on the metastore level?&lt;BR /&gt;I assume in this case every workspace should have different credentials, and possibily PROD should be accessible only to highly privileged users and service principals to run PROD workload and pipelines.&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;- what are performance implications ? With internal tables we'd have DEV, QA and PROD data all together, with possibly different retention times, and also different workloads sizes.&lt;BR /&gt;DEV and PROD workloads would still use the same ADLS, despite on different data containers of course.&lt;BR /&gt;Anyhow I see it as a problem and potential source of bottlenecks: having data in different ADLSs makes me more comfortable, performance wise. Am I worrying too much?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 28 Dec 2023 10:51:56 GMT</pubDate>
    <dc:creator>databriccone</dc:creator>
    <dc:date>2023-12-28T10:51:56Z</dc:date>
    <item>
      <title>unity catalog guidelines for internal/external tables and multiple workspaces</title>
      <link>https://community.databricks.com/t5/data-governance/unity-catalog-guidelines-for-internal-external-tables-and/m-p/55883#M1509</link>
      <description>&lt;P&gt;We're setting up from scratch the Unity Catalog in our infrastructure in Azure, that is both&lt;BR /&gt;- multi region ( europe, us)&lt;BR /&gt;- multi env (dev, qa, prod)&lt;BR /&gt;&lt;BR /&gt;So, we did setup 2 metastore, one for each region, one in west europe and one for south central us.&lt;BR /&gt;So far, so good.&lt;BR /&gt;&lt;BR /&gt;Now I have a doubt on how to integrate real data with it.&lt;BR /&gt;In a separate product we had before the unity catalog was out we had:&lt;BR /&gt;- separate ADLS storages for region, environment (So 1 ADLS for DEV, 1 for QA and so on).&amp;nbsp;&lt;BR /&gt;- separate Databricks workspace (1 for DEV, 1 for QA and so on).&lt;BR /&gt;&lt;BR /&gt;So my first approach would be to bind all these brand new ADLS as external locations.&lt;BR /&gt;So I'd have to only register catalogs, schemas, tables and volumes in the mestastores, that would contain the metadata only, and have the real data elsewhere.&lt;BR /&gt;Besides Databricks this way would not "own" the data, would I have all features available for the unity catalog as for the internal tables?&lt;BR /&gt;&lt;BR /&gt;If I wanted instead to use the internal storage, that is bound to the metastore backed ADLS, I assume I'd have to integrate in the same store DEV, QA and PROD data.&lt;BR /&gt;&lt;BR /&gt;So here are the questions:&lt;BR /&gt;&lt;BR /&gt;- what is the suggested way to proceed with naming conventions? Is it about adding a "DEV, QA, PROD" suffix to catalogs/schemas to distinguish them?&lt;BR /&gt;- how about granting access on the different&amp;nbsp; DEV, QA and PRODcatalogs and schemas for different workspaces ?&lt;BR /&gt;&amp;nbsp;There is a way to grant access to workspace level of do I need to create users and groups on the metastore level?&lt;BR /&gt;I assume in this case every workspace should have different credentials, and possibily PROD should be accessible only to highly privileged users and service principals to run PROD workload and pipelines.&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;- what are performance implications ? With internal tables we'd have DEV, QA and PROD data all together, with possibly different retention times, and also different workloads sizes.&lt;BR /&gt;DEV and PROD workloads would still use the same ADLS, despite on different data containers of course.&lt;BR /&gt;Anyhow I see it as a problem and potential source of bottlenecks: having data in different ADLSs makes me more comfortable, performance wise. Am I worrying too much?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Dec 2023 10:51:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/unity-catalog-guidelines-for-internal-external-tables-and/m-p/55883#M1509</guid>
      <dc:creator>databriccone</dc:creator>
      <dc:date>2023-12-28T10:51:56Z</dc:date>
    </item>
  </channel>
</rss>

