10-10-2022 07:19 PM
Hi all!
I would like to use a managed delta table as a temporal table, meaning:
This way I can perform merge, insert, or delete oprations better than when using spark temp view which doesn't allow users to perform them.
Since the managed tables stay in the control plane of databricks, I'm worried that the data from managed tables affects the control plane performance when the size or number of files are large (e.g. s3 api call limit).
Please provide me some advice if it's a good idea using the managed table as a temporal table in the manners that I mentioned above.
Thanks for your help in advance and your help will be very appreciated!
10-13-2022 07:12 AM
@Kwangwon Yi Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.
If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>
we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.
10-11-2022 06:08 AM
Personally I prefer unmanaged tables because I like to have control over the storage location etc.
But I am pretty sure that managed tables will work fine too.
That being said:
I would not serialize data unless there is a need for it (performance wise). The write of the temp table is an action. It takes time. Then, reading the temp table, process it and write it again to the target.
Certainly check if that does not have a (large) performance impact.
10-13-2022 07:12 AM
@Kwangwon Yi Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.
If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>
we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.
10-18-2022 04:30 AM
Hi @Kwangwon Yi , We haven’t heard from you since the last response from @Werner Stinckens and @karthik p, and I was checking back to see if you have a resolution yet.
If you have any solution, please share it with the community, as it can be helpful to others. Otherwise, we will respond with more details and try to help.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.