<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Spark Vs Spark on Yarn in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/108219#M42995</link>
    <description>&lt;P&gt;But isn’t that a hard disadvantage compared to yarn clusters?&lt;/P&gt;&lt;P&gt;And the way I understood workflows (and the team behind the UI component among other things), we clearly shall reuse the same compute cluster and run parallel tasks.&lt;/P&gt;&lt;P&gt;If I would run spark-submits would the logs be separated as finally separate sessions would spawn?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 31 Jan 2025 22:37:17 GMT</pubDate>
    <dc:creator>de-qrosh</dc:creator>
    <dc:date>2025-01-31T22:37:17Z</dc:date>
    <item>
      <title>Databricks Spark Vs Spark on Yarn</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/21383#M14562</link>
      <description>&lt;P&gt;I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn.  How is the Databricks architecture different from yarn?&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 15:25:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/21383#M14562</guid>
      <dc:creator>brickster_2018</dc:creator>
      <dc:date>2021-06-23T15:25:02Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Spark Vs Spark on Yarn</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/21384#M14563</link>
      <description>&lt;P&gt;Users often compare Databricks cluster vs Yarn Cluster.  It's not an Apple to Apple comparison. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;A Databricks cluster should be compared to a Spark Application that is submitted on Yarn. A Spark Application on Yarn will have a driver container and executor containers launched on the cluster nodes. The Application Master will run inside the Driver container (Yarn-Cluster mode). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;A Databricks cluster also has a Driver container and the executor containers launched on the cluster nodes. Unlike Yarn, we launch only one executor per virtual machine.  Application master in Yarn can be compared with the Chauffeur service in Databricks. &lt;/P&gt;&lt;P&gt;There are several benefits compared to Yarn in Databricks in this comparison:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Support of multiple languages/sessions within the same cluster. &lt;/LI&gt;&lt;LI&gt;Optimized and improved auto-scaling features.  The auto-scaling algorithm used in Databricks is very much efficient than the Dynamic allocation feature in Yarn&lt;/LI&gt;&lt;LI&gt;Faster and reliable with Spark's standalone scheduler. &lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Wed, 23 Jun 2021 22:48:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/21384#M14563</guid>
      <dc:creator>brickster_2018</dc:creator>
      <dc:date>2021-06-23T22:48:33Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Spark Vs Spark on Yarn</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/107638#M42876</link>
      <description>&lt;P&gt;What about the disadvantages?&lt;/P&gt;&lt;P&gt;How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jan 2025 16:47:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/107638#M42876</guid>
      <dc:creator>de-qrosh</dc:creator>
      <dc:date>2025-01-29T16:47:59Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Spark Vs Spark on Yarn</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/108183#M42992</link>
      <description>&lt;P&gt;Ideally, you don't want to run multiple jobs on the same cluster. There is no clean way of separating the driver logs for each job. However, in spark UI, you can use the run IDs and job IDs to separate out the spark jobs for a particular job.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Jan 2025 19:02:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/108183#M42992</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2025-01-31T19:02:53Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Spark Vs Spark on Yarn</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/108219#M42995</link>
      <description>&lt;P&gt;But isn’t that a hard disadvantage compared to yarn clusters?&lt;/P&gt;&lt;P&gt;And the way I understood workflows (and the team behind the UI component among other things), we clearly shall reuse the same compute cluster and run parallel tasks.&lt;/P&gt;&lt;P&gt;If I would run spark-submits would the logs be separated as finally separate sessions would spawn?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 31 Jan 2025 22:37:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-spark-vs-spark-on-yarn/m-p/108219#M42995</guid>
      <dc:creator>de-qrosh</dc:creator>
      <dc:date>2025-01-31T22:37:17Z</dc:date>
    </item>
  </channel>
</rss>

