topic Re: Are we using the advantage of "Map & Reduce" ? in Data Engineering

Are we using the advantage of "Map & Reduce" ?

wyzer — Mon, 07 Feb 2022 14:06:57 GMT

Hello,

We are new on Databricks and we would like to know if our working method are good.

Currently, we are working like this :

spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")

With this method, are we using the full capacity of Databricks, like "Map & Reduce" ?

Thanks.

-werners- — Tue, 08 Feb 2022 10:23:30 GMT

Spark will handle the map/reduce for you.

So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.

You just care about what you want as a result.

And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)

wyzer — Tue, 08 Feb 2022 12:53:08 GMT

Thank you.