โ06-23-2021 08:25 AM
I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn. How is the Databricks architecture different from yarn?
โ06-23-2021 03:48 PM
Users often compare Databricks cluster vs Yarn Cluster. It's not an Apple to Apple comparison.
A Databricks cluster should be compared to a Spark Application that is submitted on Yarn. A Spark Application on Yarn will have a driver container and executor containers launched on the cluster nodes. The Application Master will run inside the Driver container (Yarn-Cluster mode).
A Databricks cluster also has a Driver container and the executor containers launched on the cluster nodes. Unlike Yarn, we launch only one executor per virtual machine. Application master in Yarn can be compared with the Chauffeur service in Databricks.
There are several benefits compared to Yarn in Databricks in this comparison:
โ06-23-2021 03:48 PM
Users often compare Databricks cluster vs Yarn Cluster. It's not an Apple to Apple comparison.
A Databricks cluster should be compared to a Spark Application that is submitted on Yarn. A Spark Application on Yarn will have a driver container and executor containers launched on the cluster nodes. The Application Master will run inside the Driver container (Yarn-Cluster mode).
A Databricks cluster also has a Driver container and the executor containers launched on the cluster nodes. Unlike Yarn, we launch only one executor per virtual machine. Application master in Yarn can be compared with the Chauffeur service in Databricks.
There are several benefits compared to Yarn in Databricks in this comparison:
โ01-29-2025 08:47 AM
What about the disadvantages?
How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?
โ01-31-2025 11:02 AM
Ideally, you don't want to run multiple jobs on the same cluster. There is no clean way of separating the driver logs for each job. However, in spark UI, you can use the run IDs and job IDs to separate out the spark jobs for a particular job.
โ01-31-2025 02:37 PM
But isnโt that a hard disadvantage compared to yarn clusters?
And the way I understood workflows (and the team behind the UI component among other things), we clearly shall reuse the same compute cluster and run parallel tasks.
If I would run spark-submits would the logs be separated as finally separate sessions would spawn?
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now