cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is it a good idea to use a managed delta table as a temporal table?

Trey
New Contributor III

Hi all!

I would like to use a managed delta table as a temporal table, meaning:

  • to create a managed table in the middle of ETL process
  • to drop the managed table right after the process

This way I can perform merge, insert, or delete oprations better than when using spark temp view which doesn't allow users to perform them.

Since the managed tables stay in the control plane of databricks, I'm worried that the data from managed tables affects the control plane performance when the size or number of files are large (e.g. s3 api call limit).

Please provide me some advice if it's a good idea using the managed table as a temporal table in the manners that I mentioned above.

Thanks for your help in advance and your help will be very appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions

karthik_p
Esteemed Contributor

@Kwangwon Yiโ€‹ Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.

If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>

we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.

View solution in original post

2 REPLIES 2

-werners-
Esteemed Contributor III

Personally I prefer unmanaged tables because I like to have control over the storage location etc.

But I am pretty sure that managed tables will work fine too.

That being said:

I would not serialize data unless there is a need for it (performance wise). The write of the temp table is an action. It takes time. Then, reading the temp table, process it and write it again to the target.

Certainly check if that does not have a (large) performance impact.

karthik_p
Esteemed Contributor

@Kwangwon Yiโ€‹ Instead of performance, main issue with managed table is whenever you delete table, data under that table gets deleted.

If you have good use case on Reporting, best approach is to go with external storage location to store your managed table data, in that way your table metadata resides on /user/hive/warehouse, where your data resides on some external location which you can access by using /mnt/<xxxx>

we have a new option called Unity catalog, where you can store your managed table and data in your own storage that is best option to overcome performance and governance.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group