cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Are we using the advantage of "Map & Reduce" ?

wyzer
Contributor II

Hello,

We are new on Databricks and we would like to know if our working method are good.

Currently, we are working like this :

spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")

With this method, are we using the full capacity of Databricks, like "Map & Reduce" ?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

Spark will handle the map/reduce for you.

So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.

You just care about what you want as a result.

And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)

View solution in original post

2 REPLIES 2

-werners-
Esteemed Contributor III

Spark will handle the map/reduce for you.

So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.

You just care about what you want as a result.

And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)

Thank you.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group