- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 08:25 AM
I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn. How is the Databricks architecture different from yarn?
- Labels:
-
Apache spark
-
Emr
-
Spark
-
Spark Vs Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 03:48 PM
Users often compare Databricks cluster vs Yarn Cluster. It's not an Apple to Apple comparison.
A Databricks cluster should be compared to a Spark Application that is submitted on Yarn. A Spark Application on Yarn will have a driver container and executor containers launched on the cluster nodes. The Application Master will run inside the Driver container (Yarn-Cluster mode).
A Databricks cluster also has a Driver container and the executor containers launched on the cluster nodes. Unlike Yarn, we launch only one executor per virtual machine. Application master in Yarn can be compared with the Chauffeur service in Databricks.
There are several benefits compared to Yarn in Databricks in this comparison:
- Support of multiple languages/sessions within the same cluster.
- Optimized and improved auto-scaling features. The auto-scaling algorithm used in Databricks is very much efficient than the Dynamic allocation feature in Yarn
- Faster and reliable with Spark's standalone scheduler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 03:48 PM
Users often compare Databricks cluster vs Yarn Cluster. It's not an Apple to Apple comparison.
A Databricks cluster should be compared to a Spark Application that is submitted on Yarn. A Spark Application on Yarn will have a driver container and executor containers launched on the cluster nodes. The Application Master will run inside the Driver container (Yarn-Cluster mode).
A Databricks cluster also has a Driver container and the executor containers launched on the cluster nodes. Unlike Yarn, we launch only one executor per virtual machine. Application master in Yarn can be compared with the Chauffeur service in Databricks.
There are several benefits compared to Yarn in Databricks in this comparison:
- Support of multiple languages/sessions within the same cluster.
- Optimized and improved auto-scaling features. The auto-scaling algorithm used in Databricks is very much efficient than the Dynamic allocation feature in Yarn
- Faster and reliable with Spark's standalone scheduler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2025 08:47 AM
What about the disadvantages?
How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2025 11:02 AM
Ideally, you don't want to run multiple jobs on the same cluster. There is no clean way of separating the driver logs for each job. However, in spark UI, you can use the run IDs and job IDs to separate out the spark jobs for a particular job.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2025 02:37 PM
But isn’t that a hard disadvantage compared to yarn clusters?
And the way I understood workflows (and the team behind the UI component among other things), we clearly shall reuse the same compute cluster and run parallel tasks.
If I would run spark-submits would the logs be separated as finally separate sessions would spawn?

