Databricks Community

wyzer · ‎02-07-2022

Hello,

We are new on Databricks and we would like to know if our working method are good.

Currently, we are working like this :

spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")

With this method, are we using the full capacity of Databricks, like "Map & Reduce" ?

Thanks.

-werners- · ‎02-08-2022

Spark will handle the map/reduce for you.

So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.

You just care about what you want as a result.

And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)

-werners- · ‎02-08-2022

Spark will handle the map/reduce for you.

So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.

You just care about what you want as a result.

And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)

wyzer · ‎02-08-2022

Thank you.

Are we using the advantage of "Map & Reduce" ?