Databricks Community

Trey · ‎10-10-2022

Hi all!

I would like to use a managed delta table as a temporal table, meaning:

to create a managed table in the middle of ETL process
to drop the managed table right after the process

This way I can perform merge, insert, or delete oprations better than when using spark temp view which doesn't allow users to perform them.

Since the managed tables stay in the control plane of databricks, I'm worried that the data from managed tables affects the control plane performance when the size or number of files are large (e.g. s3 api call limit).

Please provide me some advice if it's a good idea using the managed table as a temporal table in the manners that I mentioned above.

Thanks for your help in advance and your help will be very appreciated!

karthik_p · ‎10-13-2022

@Kwangwon Yi Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.

If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>

we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.

View solution in original post

-werners- · ‎10-11-2022

Personally I prefer unmanaged tables because I like to have control over the storage location etc.

But I am pretty sure that managed tables will work fine too.

That being said:

I would not serialize data unless there is a need for it (performance wise). The write of the temp table is an action. It takes time. Then, reading the temp table, process it and write it again to the target.

Certainly check if that does not have a (large) performance impact.

karthik_p · ‎10-13-2022

@Kwangwon Yi Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.

If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>

we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.

Databricks Community

Is it a good idea to use a managed delta table as a temporal table?

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

How to present and share your Notebook insights in AI/BI Dashboards

Introducing an exclusively Databricks-hosted Assistant

Meet the Databricks MVPs

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs