temporary tables or dataframes,

Phani1 — Fri, 26 Apr 2024 09:34:25 GMT

We have to generate over 70 intermediate tables. Should we use temporary tables or dataframes, or should we create delta tables and truncate and reload? Having too many temporary tables could lead to memory problems. In this situation, what is the most effective approach when one intermediate table relies on another?

Re: temporary tables or dataframes,

NandiniN — Wed, 01 May 2024 00:27:34 GMT

Hi Phani1,

It would be a use case specific answer, so if it is possible I would suggest to work with the Solution Architect on this or share some more insights for a better guidance.

When I say that, I just would want to understand would we really need 70 intermediate tables or there can be a design where a categorical column could be leveraged to distinguish the rows from a larger table instead of multiple tables.

As you said, "or should we create delta tables and truncate and reload?" I understand you don't need the earlier snapsots of the data and it would be just for this transaction. So, persisting in a delta table can be used if there are non-deterministic functions being used, to avoid unpredictable results. Delta tables also would be a good option to help you debug and peek into the intermediate results.

Using the Dataframe or temporary tables, depends on the size of these tables and how much resource(and cost) you want to allocate to your compute. If they are light and can be kept in memory, this would be a faster approach

But once again, I would like to emphasize that it would be better if the Account owners can have a better understanding of the data and then suggest you the most optimized approach for your use case.

Thanks!

topic Re: temporary tables or dataframes, in Get Started Discussions

temporary tables or dataframes,

Re: temporary tables or dataframes,