<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: temporary tables or dataframes, in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/temporary-tables-or-dataframes/m-p/67765#M7180</link>
    <description>&lt;P&gt;Hi &lt;A id="link_7" class="lia-link-navigation lia-page-link lia-user-name-link" href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36892" target="_self" aria-label="View Profile of Phani1"&gt;&lt;SPAN class=""&gt;Phani1,&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;It would be a use case specific answer, so if it is possible I would suggest to work with the Solution Architect on this or share some more insights for a better guidance.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;When I say that, I just would want to understand would we really need 70 intermediate tables or there can be a design where a categorical column could be leveraged to distinguish the rows from a larger table instead of multiple tables.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;As you said, "&lt;SPAN&gt;or should we create delta tables and &lt;STRONG&gt;truncate and reload&lt;/STRONG&gt;?" I understand you don't need the earlier snapsots of the data and it would be just for this transaction. So, persisting in a delta table can be used if there are non-deterministic functions being used, to avoid unpredictable results. Delta tables also would be a good option to help you debug and peek into the intermediate results.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;Using the Dataframe or temporary tables, depends on the size of these tables and how much resource(and cost) you want to allocate to your compute. If they are light and can be kept in memory, this would be a faster approach&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;But once again, I would like to emphasize that it would be better if the Account owners can have a better understanding of the data and then suggest you the most optimized approach for your use case.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;Thanks!&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 01 May 2024 00:27:34 GMT</pubDate>
    <dc:creator>NandiniN</dc:creator>
    <dc:date>2024-05-01T00:27:34Z</dc:date>
    <item>
      <title>temporary tables or dataframes,</title>
      <link>https://community.databricks.com/t5/get-started-discussions/temporary-tables-or-dataframes/m-p/67369#M7179</link>
      <description>&lt;P&gt;We have to generate over 70 intermediate tables. Should we use temporary tables or dataframes, or should we create delta tables and truncate and reload? Having too many temporary tables could lead to memory problems. In this situation, what is the most effective approach when one intermediate table relies on another?&lt;/P&gt;</description>
      <pubDate>Fri, 26 Apr 2024 09:34:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/temporary-tables-or-dataframes/m-p/67369#M7179</guid>
      <dc:creator>Phani1</dc:creator>
      <dc:date>2024-04-26T09:34:25Z</dc:date>
    </item>
    <item>
      <title>Re: temporary tables or dataframes,</title>
      <link>https://community.databricks.com/t5/get-started-discussions/temporary-tables-or-dataframes/m-p/67765#M7180</link>
      <description>&lt;P&gt;Hi &lt;A id="link_7" class="lia-link-navigation lia-page-link lia-user-name-link" href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36892" target="_self" aria-label="View Profile of Phani1"&gt;&lt;SPAN class=""&gt;Phani1,&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;It would be a use case specific answer, so if it is possible I would suggest to work with the Solution Architect on this or share some more insights for a better guidance.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;When I say that, I just would want to understand would we really need 70 intermediate tables or there can be a design where a categorical column could be leveraged to distinguish the rows from a larger table instead of multiple tables.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;As you said, "&lt;SPAN&gt;or should we create delta tables and &lt;STRONG&gt;truncate and reload&lt;/STRONG&gt;?" I understand you don't need the earlier snapsots of the data and it would be just for this transaction. So, persisting in a delta table can be used if there are non-deterministic functions being used, to avoid unpredictable results. Delta tables also would be a good option to help you debug and peek into the intermediate results.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;Using the Dataframe or temporary tables, depends on the size of these tables and how much resource(and cost) you want to allocate to your compute. If they are light and can be kept in memory, this would be a faster approach&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;But once again, I would like to emphasize that it would be better if the Account owners can have a better understanding of the data and then suggest you the most optimized approach for your use case.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;Thanks!&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 May 2024 00:27:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/temporary-tables-or-dataframes/m-p/67765#M7180</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2024-05-01T00:27:34Z</dc:date>
    </item>
  </channel>
</rss>

