<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Which is quicker: grouping a table that is a join of several others or querying data? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/which-is-quicker-grouping-a-table-that-is-a-join-of-several/m-p/28780#M20552</link>
    <description>&lt;P&gt;Hi @Marcos Dias​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Frankly, I think we need more detail to answer your question:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Are these 4 dataframes​ updated their data?&lt;/LI&gt;&lt;LI&gt;How often you use the groupBy-dataframe?&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Tue, 15 Nov 2022 09:14:38 GMT</pubDate>
    <dc:creator>NhatHoang</dc:creator>
    <dc:date>2022-11-15T09:14:38Z</dc:date>
    <item>
      <title>Which is quicker: grouping a table that is a join of several others or querying data?</title>
      <link>https://community.databricks.com/t5/data-engineering/which-is-quicker-grouping-a-table-that-is-a-join-of-several/m-p/28777#M20549</link>
      <description>&lt;P&gt;This may be a tricky question, so please bear with me&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In a real life scenario, i have a &lt;B&gt;dataframe&lt;/B&gt; (i'm using &lt;B&gt;pyspark&lt;/B&gt;) called &lt;B&gt;age&lt;/B&gt;, with is a &lt;B&gt;groupBy &lt;/B&gt;of other 4 dataframes. I join these 4 so at the end I have a few million rows, but after the &lt;B&gt;groupBy &lt;/B&gt;the numbers are reduced for some 200 rows.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I then save this dataframe to an s3 bucket.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The question now is:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;what is quicker: performing more &lt;B&gt;groupBy &lt;/B&gt;in this dataframe, or querying the data i just saved in s3 and then applying the &lt;B&gt;groupBy &lt;/B&gt;to it?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The final goal is to save this second &lt;B&gt;groupBy &lt;/B&gt;in s3 too. &lt;/P&gt;</description>
      <pubDate>Thu, 06 Oct 2022 14:52:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/which-is-quicker-grouping-a-table-that-is-a-join-of-several/m-p/28777#M20549</guid>
      <dc:creator>markdias</dc:creator>
      <dc:date>2022-10-06T14:52:25Z</dc:date>
    </item>
    <item>
      <title>Re: Which is quicker: grouping a table that is a join of several others or querying data?</title>
      <link>https://community.databricks.com/t5/data-engineering/which-is-quicker-grouping-a-table-that-is-a-join-of-several/m-p/28778#M20550</link>
      <description>&lt;P&gt;'with is a &lt;B&gt;groupBy &lt;/B&gt;of other 4 dataframes' I don't understand it, you can share code.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Faster will be to process everything in one goal usually.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2022 12:12:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/which-is-quicker-grouping-a-table-that-is-a-join-of-several/m-p/28778#M20550</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-10-14T12:12:42Z</dc:date>
    </item>
    <item>
      <title>Re: Which is quicker: grouping a table that is a join of several others or querying data?</title>
      <link>https://community.databricks.com/t5/data-engineering/which-is-quicker-grouping-a-table-that-is-a-join-of-several/m-p/28779#M20551</link>
      <description>&lt;P&gt;Hi @Marcos Dias​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Does&amp;nbsp;&lt;A href="https://community.databricks.com/s/profile/0053f000000WW82AAG" alt="https://community.databricks.com/s/profile/0053f000000WW82AAG" target="_blank"&gt;@Hubert Dudek&lt;/A&gt;&amp;nbsp;(Customer)​&amp;nbsp;response were able to resolve your issue, and would you be happy to share the solution or&amp;nbsp;&lt;B&gt;mark an answer as best&lt;/B&gt;? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 15 Nov 2022 08:48:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/which-is-quicker-grouping-a-table-that-is-a-join-of-several/m-p/28779#M20551</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-11-15T08:48:59Z</dc:date>
    </item>
    <item>
      <title>Re: Which is quicker: grouping a table that is a join of several others or querying data?</title>
      <link>https://community.databricks.com/t5/data-engineering/which-is-quicker-grouping-a-table-that-is-a-join-of-several/m-p/28780#M20552</link>
      <description>&lt;P&gt;Hi @Marcos Dias​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Frankly, I think we need more detail to answer your question:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Are these 4 dataframes​ updated their data?&lt;/LI&gt;&lt;LI&gt;How often you use the groupBy-dataframe?&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 15 Nov 2022 09:14:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/which-is-quicker-grouping-a-table-that-is-a-join-of-several/m-p/28780#M20552</guid>
      <dc:creator>NhatHoang</dc:creator>
      <dc:date>2022-11-15T09:14:38Z</dc:date>
    </item>
  </channel>
</rss>

