Databricks Community

brickster_2018 · ‎06-23-2021

I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn. How is the Databricks architecture different from yarn?

brickster_2018 · ‎06-23-2021

Users often compare Databricks cluster vs Yarn Cluster. It's not an Apple to Apple comparison.

A Databricks cluster should be compared to a Spark Application that is submitted on Yarn. A Spark Application on Yarn will have a driver container and executor containers launched on the cluster nodes. The Application Master will run inside the Driver container (Yarn-Cluster mode).

A Databricks cluster also has a Driver container and the executor containers launched on the cluster nodes. Unlike Yarn, we launch only one executor per virtual machine. Application master in Yarn can be compared with the Chauffeur service in Databricks.

There are several benefits compared to Yarn in Databricks in this comparison:

Support of multiple languages/sessions within the same cluster.
Optimized and improved auto-scaling features. The auto-scaling algorithm used in Databricks is very much efficient than the Dynamic allocation feature in Yarn
Faster and reliable with Spark's standalone scheduler.

View solution in original post

brickster_2018 · ‎06-23-2021

Users often compare Databricks cluster vs Yarn Cluster. It's not an Apple to Apple comparison.

A Databricks cluster should be compared to a Spark Application that is submitted on Yarn. A Spark Application on Yarn will have a driver container and executor containers launched on the cluster nodes. The Application Master will run inside the Driver container (Yarn-Cluster mode).

A Databricks cluster also has a Driver container and the executor containers launched on the cluster nodes. Unlike Yarn, we launch only one executor per virtual machine. Application master in Yarn can be compared with the Chauffeur service in Databricks.

There are several benefits compared to Yarn in Databricks in this comparison:

Support of multiple languages/sessions within the same cluster.
Optimized and improved auto-scaling features. The auto-scaling algorithm used in Databricks is very much efficient than the Dynamic allocation feature in Yarn
Faster and reliable with Spark's standalone scheduler.

de-qrosh · ‎01-29-2025

What about the disadvantages?

How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?

Lakshay · ‎01-31-2025

Ideally, you don't want to run multiple jobs on the same cluster. There is no clean way of separating the driver logs for each job. However, in spark UI, you can use the run IDs and job IDs to separate out the spark jobs for a particular job.

de-qrosh · ‎01-31-2025

But isn’t that a hard disadvantage compared to yarn clusters?

And the way I understood workflows (and the team behind the UI component among other things), we clearly shall reuse the same compute cluster and run parallel tasks.

If I would run spark-submits would the logs be separated as finally separate sessions would spawn?

Databricks Community

Databricks Spark Vs Spark on Yarn

Join Us as a Local Community Builder!

🚀 Weekly Delta (1 - 7 October): A Look Back at This Week’s Top Community Highlights!

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions

Announcing Data Intelligence for Cybersecurity