- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2022 07:19 PM
Hi all!
I would like to use a managed delta table as a temporal table, meaning:
- to create a managed table in the middle of ETL process
- to drop the managed table right after the process
This way I can perform merge, insert, or delete oprations better than when using spark temp view which doesn't allow users to perform them.
Since the managed tables stay in the control plane of databricks, I'm worried that the data from managed tables affects the control plane performance when the size or number of files are large (e.g. s3 api call limit).
Please provide me some advice if it's a good idea using the managed table as a temporal table in the manners that I mentioned above.
Thanks for your help in advance and your help will be very appreciated!
- Labels:
-
Control Plane
-
Delta
-
Managed Table
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-13-2022 07:12 AM
@Kwangwon Yi Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.
If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>
we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-11-2022 06:08 AM
Personally I prefer unmanaged tables because I like to have control over the storage location etc.
But I am pretty sure that managed tables will work fine too.
That being said:
I would not serialize data unless there is a need for it (performance wise). The write of the temp table is an action. It takes time. Then, reading the temp table, process it and write it again to the target.
Certainly check if that does not have a (large) performance impact.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-13-2022 07:12 AM
@Kwangwon Yi Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.
If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>
we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.

