cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Which is quicker: grouping a table that is a join of several others or querying data?

markdias
New Contributor II

This may be a tricky question, so please bear with me

In a real life scenario, i have a dataframe (i'm using pyspark) called age, with is a groupBy of other 4 dataframes. I join these 4 so at the end I have a few million rows, but after the groupBy the numbers are reduced for some 200 rows.

I then save this dataframe to an s3 bucket.

The question now is:

what is quicker: performing more groupBy in this dataframe, or querying the data i just saved in s3 and then applying the groupBy to it?

The final goal is to save this second groupBy in s3 too.

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

'with is a groupBy of other 4 dataframes' I don't understand it, you can share code.

Faster will be to process everything in one goal usually.

Anonymous
Not applicable

Hi @Marcos Dias​ 

Hope all is well!

Does @Hubert Dudek (Customer)​ response were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

NhatHoang
Valued Contributor II

Hi @Marcos Dias​ ,

Frankly, I think we need more detail to answer your question:

  • Are these 4 dataframes​ updated their data?
  • How often you use the groupBy-dataframe?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group