02-06-2024 01:37 AM
We are looking into the use of managed tables on databricks. As this decision won’t be easy to reverse I am reaching out to all of you fine folks to learn more about your experience with using this.
If I understand correctly we dont have to deal with manageing the storage as databricks will make guids for schemas and tables. The readability will be worse on the storage it self (usning ADLS at the moment) but I dont think that matters so much as we will still have good readability within the databricks environment.
Together with the managed tables we were thinking to use tags together with the built in metadata so we can build and share the three structure if needed.
What is the pros and cons of managed tables?
What are some things I should look into before deciding?
02-06-2024 03:31 AM
Managed tables are the tables which are completely managed by databricks, i.e. If we drop the table from the databricks the underlying files will be also deleted.
Ideally it should be used in the following cases:
02-06-2024 03:49 AM - edited 02-06-2024 03:58 AM
Thanks for your response @Hkesharwani
In what scenario will we need to drop tables? Cant we just avoid giving drop table privileges to our analysts, superusers and users?
Our current thought is that we will manage access and data lifecycle anyways.
In addition, cant we just use the undrop command within 7 days? (we are using UC)
UNDROP TABLE | Databricks on AWS
02-06-2024 06:49 AM
Hi cltj,
As I mentioned that you may drop tables when you have to only save data for temp purpose. And yes you can only grant required access to the team.
I believe https://docs.databricks.com/en/sql/language-manual/sql-ref-privileges.html this will be a great help for you.
02-06-2024 07:37 AM
I would recommend using managed tables for table backups and tables used for data processing in the notebooks that can be dropped at the end of the process or kind of staging table. I have not explored how to copy a managed table from Dev to QA Environment. Incase of external table , we can copy the storage folder from one Dev Storage Account to QA Storage Account and create the DDL.
2 weeks ago
Databricks recommends to ALWAYS use Managed Tables always UNLESS:
Managed Tables are just better... Databricks manages:
There should be NO need whatsoever for users to know WHERE the table data is stored, any operation dealing with a table should be done directly to the table elements through the Metastore and not to the files.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group