- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-07-2022 06:06 AM
Hello,
We are new on Databricks and we would like to know if our working method are good.
Currently, we are working like this :
spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")
With this method, are we using the full capacity of Databricks, like "Map & Reduce" ?
Thanks.
- Labels:
-
Map
-
Method
-
Optimization
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2022 02:23 AM
Spark will handle the map/reduce for you.
So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.
You just care about what you want as a result.
And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2022 02:23 AM
Spark will handle the map/reduce for you.
So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.
You just care about what you want as a result.
And afterwards when you are more familiar with Spark you can start tuning (f.e. trying to avoid shuffles, other join types etc)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2022 04:53 AM
Thank you.

