Experiences using managed tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-06-2024 01:37 AM
We are looking into the use of managed tables on databricks. As this decision wonโt be easy to reverse I am reaching out to all of you fine folks to learn more about your experience with using this.
If I understand correctly we dont have to deal with manageing the storage as databricks will make guids for schemas and tables. The readability will be worse on the storage it self (usning ADLS at the moment) but I dont think that matters so much as we will still have good readability within the databricks environment.
Together with the managed tables we were thinking to use tags together with the built in metadata so we can build and share the three structure if needed.
What is the pros and cons of managed tables?
What are some things I should look into before deciding?
- Labels:
-
Delta Lake
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-06-2024 03:31 AM
Managed tables are the tables which are completely managed by databricks, i.e. If we drop the table from the databricks the underlying files will be also deleted.
Ideally it should be used in the following cases:
- if you have temporary data that is not critical to your long-term storage or analysis.
- If you have ad-hoc analysis scenarios where data is not required to persist beyond the scope of the analysis, you can use managed tables.
- If multiple users or teams need to access and work with the same table, it's recommended to use external tables instead of managed tables. External tables provide more flexibility in terms of data sharing and access control.
Data engineer at Rsystema
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-06-2024 03:49 AM - edited โ02-06-2024 03:58 AM
Thanks for your response @Hkesharwani
In what scenario will we need to drop tables? Cant we just avoid giving drop table privileges to our analysts, superusers and users?
Our current thought is that we will manage access and data lifecycle anyways.
In addition, cant we just use the undrop command within 7 days? (we are using UC)
UNDROP TABLE | Databricks on AWS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-06-2024 06:49 AM
Hi cltj,
As I mentioned that you may drop tables when you have to only save data for temp purpose. And yes you can only grant required access to the team.
I believe https://docs.databricks.com/en/sql/language-manual/sql-ref-privileges.html this will be a great help for you.
Data engineer at Rsystema
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-06-2024 07:37 AM
I would recommend using managed tables for table backups and tables used for data processing in the notebooks that can be dropped at the end of the process or kind of staging table. I have not explored how to copy a managed table from Dev to QA Environment. Incase of external table , we can copy the storage folder from one Dev Storage Account to QA Storage Account and create the DDL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ12-09-2024 06:09 PM
Databricks recommends to ALWAYS use Managed Tables always UNLESS:
- Your tables are not Delta
- You explicitly need to have the table files in a specific location
Managed Tables are just better... Databricks manages:
- the upgrades (Deletion Vectors? Column Mapping? If they are managed, you will get that)
- the layout (Optimal number of files, optimal clustering to accelerate queries... all with Predictive Optimization)
- things like renaming a table, dropping and undropping
- plus observability and other cool stats provided by UC
There should be NO need whatsoever for users to know WHERE the table data is stored, any operation dealing with a table should be done directly to the table elements through the Metastore and not to the files.

