cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Is it a good idea to use a managed delta table as a temporal table?

Trey
New Contributor III

Hi all!

I would like to use a managed delta table as a temporal table, meaning:

  • to create a managed table in the middle of ETL process
  • to drop the managed table right after the process

This way I can perform merge, insert, or delete oprations better than when using spark temp view which doesn't allow users to perform them.

Since the managed tables stay in the control plane of databricks, I'm worried that the data from managed tables affects the control plane performance when the size or number of files are large (e.g. s3 api call limit).

Please provide me some advice if it's a good idea using the managed table as a temporal table in the manners that I mentioned above.

Thanks for your help in advance and your help will be very appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions

karthik_p
Esteemed Contributor

@Kwangwon Yi​ Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.

If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>

we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.

View solution in original post

3 REPLIES 3

-werners-
Esteemed Contributor III

Personally I prefer unmanaged tables because I like to have control over the storage location etc.

But I am pretty sure that managed tables will work fine too.

That being said:

I would not serialize data unless there is a need for it (performance wise). The write of the temp table is an action. It takes time. Then, reading the temp table, process it and write it again to the target.

Certainly check if that does not have a (large) performance impact.

karthik_p
Esteemed Contributor

@Kwangwon Yi​ Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.

If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>

we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.

Kaniz
Community Manager
Community Manager

Hi @Kwangwon Yi​ , We haven’t heard from you since the last response from @Werner Stinckens​ and @karthik p​, and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community, as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.