Databricks Community

Mani2105 · ‎10-28-2024

Hi Experts,

I have a workspace created and associated a metastore with it, the metastore points to a storage location USDATA and then I create two catalogs in the workspace and one is using default meta store as the external storage location and other catalog is sales and i created a external sales storage location and provided security access.

Abive screen shot says storage for managed tables on external locations, so now my question is if i create a table in the sales catalog without specifiying any external location, will the tables created be managed and will go to the Sales storage account and in the event of deleting the table ,will it delete the files associated as welland auto optimize? What will my metastore USDATA have information about the sales catalog , if the sales catalog had a seperate storage location , its the metadata about the catalog goes to metastore?

agallard · ‎10-30-2024

Hi @Mani2105,

if i create a table in the sales catalog without specifiying any external location, will the tables created be managed and will go to the Sales storage account

👉 Yes, if you create a table in the sales catalog without specifying any external location, this table will be automatically managed and the data will be stored in the default storage location configured for the sales catalog.

and in the event of deleting the table ,will it delete the files associated as welland auto optimize?

👉 Yes, in the case of managed tables within Unity Catalog in Databricks, deleting the table will also delete the associated files stored in the catalog’s storage location

Delta Lake’s auto-optimization features (like autoCompact and optimizeWrite) apply to managed tables. If you’ve enabled these settings, they will continuously optimize the storage layout, such as compacting small files or applying Z-ordering, to improve query performance and storage efficiency.
You can enable auto-optimization for the entire workspace, catalog, or individual tables using configuration settings:

# Enabling auto-optimization
spark.conf.set("spark.databricks.delta.autoCompact.enabled", "true")
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true")

What will my metastore USDATA have information about the sales catalog , if the sales catalog had a seperate storage location , its the metadata about the catalog goes to metastore?

👉 The USDATA metastore stores metadata for all catalogs in your workspace, including the Sales catalog. Although Sales has a separate storage location, only metadata about Sales (such as its tables, schemas, and storage path) is stored in the USDATA metastore—the actual data files reside in the storage location designated for Sales.

The metastore (USDATA) holds metadata about all catalogs, schemas, and tables within your Databricks workspace, including:

Information about each catalog (e.g., Sales).
Schemas (databases) within each catalog.
Tables and views, including details such as columns, data types, and table properties.
Access control configurations, permissions, and security settings.

Let me know if you’d like more details on any part of this process!

Regards!

Alfonso Gallardo
-------------------
 I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark

Databricks Community

Managed Table

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences