<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: yarn.nodemanager.resource.memory-mb parameter update in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/yarn-nodemanager-resource-memory-mb-parameter-update/m-p/11302#M6296</link>
    <description>&lt;P&gt;Hi @Andriy Shevchenko​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Databricks does not use Yarn. I recommend you to try to use Databricks community edition &lt;A href="https://community.cloud.databricks.com/login.html" alt="https://community.cloud.databricks.com/login.html" target="_blank"&gt;link&lt;/A&gt; to get familiar and explore. You can check Ganglia UI to see how is the cluster utilization, memory, cpu, IO, etc&lt;/P&gt;</description>
    <pubDate>Sat, 13 Nov 2021 00:16:56 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2021-11-13T00:16:56Z</dc:date>
    <item>
      <title>yarn.nodemanager.resource.memory-mb parameter update</title>
      <link>https://community.databricks.com/t5/data-engineering/yarn-nodemanager-resource-memory-mb-parameter-update/m-p/11299#M6293</link>
      <description>&lt;P&gt;I am currently working on determining proper cluster size for my Spark application and I have a question regarding Hadoop configuration parameter&amp;nbsp;&lt;B&gt;yarn.nodemanager.resource.memory-mb&lt;/B&gt;. From what I see, this parameter is responsible for setting the physical limit of memory available for Spark containers on the worker node running under YARN scheduler. The thing I noticed is that for the worker node of any size, this parameter is still set at 8192. This bothers me because it should imply that even for clusters where worker size is significantly larger, only&amp;nbsp;8192 MB is designated to executor memory. I have tried to override the property by setting this property via adding it to&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;/home/ubuntu/databricks/spark/dbconf/hadoop/core-site.xml&lt;/B&gt; file through cluster init script. However, even though I set it there, it looks like it is being overridden from elsewhere. So from here I want to understand:&lt;/P&gt;&lt;P&gt;- whether the limit that is set here really puts the limit on the amount of executor memory for the cluster&lt;/P&gt;&lt;P&gt;- if so, how/should it be overridden from some other place in order to properly utilize memory available on the worker node&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 09 Nov 2021 10:14:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/yarn-nodemanager-resource-memory-mb-parameter-update/m-p/11299#M6293</guid>
      <dc:creator>Andriy_Shevchen</dc:creator>
      <dc:date>2021-11-09T10:14:06Z</dc:date>
    </item>
    <item>
      <title>Re: yarn.nodemanager.resource.memory-mb parameter update</title>
      <link>https://community.databricks.com/t5/data-engineering/yarn-nodemanager-resource-memory-mb-parameter-update/m-p/11301#M6295</link>
      <description>&lt;P&gt;Databricks does not use Yarn AFAIK (see &lt;A href="https://community.databricks.com/s/question/0D53f00001GHVOQCA5/databricks-spark-vs-spark-on-yarn" alt="https://community.databricks.com/s/question/0D53f00001GHVOQCA5/databricks-spark-vs-spark-on-yarn" target="_blank"&gt;this topic&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;The memory allocation is handled by spark.executor.memory.&lt;/P&gt;&lt;P&gt;The amount of memory available for each executor is allocated within the Java Virtual Machine (JVM) memory heap.&lt;/P&gt;&lt;P&gt;Here is some more detail:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/databricks/kb/clusters/spark-executor-memory" alt="https://docs.microsoft.com/en-us/azure/databricks/kb/clusters/spark-executor-memory" target="_blank"&gt;Azure&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://kb.databricks.com/clusters/spark-executor-memory.html" alt="https://kb.databricks.com/clusters/spark-executor-memory.html" target="_blank"&gt;AWS&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can also do a test run on a cluster and then monitor the workers and driver using Ganglia, which gives you a view on what's goin on and how much memory is allocated/used.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Nov 2021 10:03:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/yarn-nodemanager-resource-memory-mb-parameter-update/m-p/11301#M6295</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-11-10T10:03:57Z</dc:date>
    </item>
    <item>
      <title>Re: yarn.nodemanager.resource.memory-mb parameter update</title>
      <link>https://community.databricks.com/t5/data-engineering/yarn-nodemanager-resource-memory-mb-parameter-update/m-p/11302#M6296</link>
      <description>&lt;P&gt;Hi @Andriy Shevchenko​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Databricks does not use Yarn. I recommend you to try to use Databricks community edition &lt;A href="https://community.cloud.databricks.com/login.html" alt="https://community.cloud.databricks.com/login.html" target="_blank"&gt;link&lt;/A&gt; to get familiar and explore. You can check Ganglia UI to see how is the cluster utilization, memory, cpu, IO, etc&lt;/P&gt;</description>
      <pubDate>Sat, 13 Nov 2021 00:16:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/yarn-nodemanager-resource-memory-mb-parameter-update/m-p/11302#M6296</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-11-13T00:16:56Z</dc:date>
    </item>
  </channel>
</rss>

