cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Are we using the advantage of "Map & Reduce" ?

wyzer
Contributor II

Hello,

We are new on Databricks and we would like to know if our working method are good.

Currently, we are working like this :

spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")

With this method, are we using the full capacity of Databricks, like "Map & Reduce" ?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

Spark will handle the map/reduce for you.

So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.

You just care about what you want as a result.

And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)

View solution in original post

2 REPLIES 2

-werners-
Esteemed Contributor III

Spark will handle the map/reduce for you.

So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.

You just care about what you want as a result.

And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)

Thank you.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now