โ02-07-2022 06:06 AM
Hello,
We are new on Databricks and we would like to know if our working method are good.
Currently, we are working like this :
spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")
With this method, are we using the full capacity of Databricks, like "Map & Reduce" ?
Thanks.
โ02-08-2022 02:23 AM
Spark will handle the map/reduce for you.
So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.
You just care about what you want as a result.
And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)
โ02-07-2022 06:25 AM
Hi @Salah K.โ ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.
โ02-08-2022 02:23 AM
Spark will handle the map/reduce for you.
So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.
You just care about what you want as a result.
And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)
โ02-08-2022 04:53 AM
Thank you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group