cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

temporary tables or dataframes,

Phani1
Valued Contributor II

We have to generate over 70 intermediate tables. Should we use temporary tables or dataframes, or should we create delta tables and truncate and reload? Having too many temporary tables could lead to memory problems. In this situation, what is the most effective approach when one intermediate table relies on another?

1 REPLY 1

NandiniN
Databricks Employee
Databricks Employee

Hi Phani1,

It would be a use case specific answer, so if it is possible I would suggest to work with the Solution Architect on this or share some more insights for a better guidance.

When I say that, I just would want to understand would we really need 70 intermediate tables or there can be a design where a categorical column could be leveraged to distinguish the rows from a larger table instead of multiple tables.

As you said, "or should we create delta tables and truncate and reload?" I understand you don't need the earlier snapsots of the data and it would be just for this transaction. So, persisting in a delta table can be used if there are non-deterministic functions being used, to avoid unpredictable results. Delta tables also would be a good option to help you debug and peek into the intermediate results.

Using the Dataframe or temporary tables, depends on the size of these tables and how much resource(and cost) you want to allocate to your compute. If they are light and can be kept in memory, this would be a faster approach

But once again, I would like to emphasize that it would be better if the Account owners can have a better understanding of the data and then suggest you the most optimized approach for your use case.

Thanks!

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group