<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Ganglia not working with custom container services in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/ganglia-not-working-with-custom-container-services/m-p/12721#M7486</link>
    <description>&lt;P&gt;Setup:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;custom docker container starting from the "databricksruntime/gpu-conda:cuda11" base image layer&lt;/LI&gt;&lt;LI&gt;10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)&lt;/LI&gt;&lt;LI&gt;multi-node, p3.8xlarge GPU compute&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I try to view Ganglia metrics I am met with  "502 Bad Gateway":&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/863i6E532334BEBD1F2C/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;Even after ~1hr of my compute cluster running there are no logs at all:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/861i83410AA78C4BD367/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As a sanity check I booted another compute without a custom docker container (using 11.3 LTS ML (includes Apache Spark 3.3.0, GPU, Scala 2.12)) and the Ganglia metrics work fine.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are there any limitations with Ganglia metrics and custom docker containers?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also when I am using the custom docker container, I am forced to use the standard runtime (10.4 LTS) as the Machine Learning runtimes do not support custom containers (see &lt;A href="https://docs.databricks.com/clusters/custom-containers.html#requirements)" target="test_blank"&gt;https://docs.databricks.com/clusters/custom-containers.html#requirements)&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am thinking this could be a source of the issue too. Does the ML runtime provide any needed libraries for Ganglia to work on GPU compute?&lt;/P&gt;</description>
    <pubDate>Wed, 11 Jan 2023 02:04:31 GMT</pubDate>
    <dc:creator>jamesw</dc:creator>
    <dc:date>2023-01-11T02:04:31Z</dc:date>
    <item>
      <title>Ganglia not working with custom container services</title>
      <link>https://community.databricks.com/t5/data-engineering/ganglia-not-working-with-custom-container-services/m-p/12721#M7486</link>
      <description>&lt;P&gt;Setup:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;custom docker container starting from the "databricksruntime/gpu-conda:cuda11" base image layer&lt;/LI&gt;&lt;LI&gt;10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)&lt;/LI&gt;&lt;LI&gt;multi-node, p3.8xlarge GPU compute&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I try to view Ganglia metrics I am met with  "502 Bad Gateway":&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/863i6E532334BEBD1F2C/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;Even after ~1hr of my compute cluster running there are no logs at all:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/861i83410AA78C4BD367/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As a sanity check I booted another compute without a custom docker container (using 11.3 LTS ML (includes Apache Spark 3.3.0, GPU, Scala 2.12)) and the Ganglia metrics work fine.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are there any limitations with Ganglia metrics and custom docker containers?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also when I am using the custom docker container, I am forced to use the standard runtime (10.4 LTS) as the Machine Learning runtimes do not support custom containers (see &lt;A href="https://docs.databricks.com/clusters/custom-containers.html#requirements)" target="test_blank"&gt;https://docs.databricks.com/clusters/custom-containers.html#requirements)&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am thinking this could be a source of the issue too. Does the ML runtime provide any needed libraries for Ganglia to work on GPU compute?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jan 2023 02:04:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ganglia-not-working-with-custom-container-services/m-p/12721#M7486</guid>
      <dc:creator>jamesw</dc:creator>
      <dc:date>2023-01-11T02:04:31Z</dc:date>
    </item>
    <item>
      <title>Re: Ganglia not working with custom container services</title>
      <link>https://community.databricks.com/t5/data-engineering/ganglia-not-working-with-custom-container-services/m-p/12722#M7487</link>
      <description>&lt;P&gt;Hi @James W​&amp;nbsp;, Ganglia is not available for custom docker containers by default. This is a known limitation. &lt;/P&gt;&lt;P&gt;However, you can try this experimental support for ganglia in custom DCS:&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/databricks/containers/tree/master/experimental/ubuntu/ganglia" alt="https://github.com/databricks/containers/tree/master/experimental/ubuntu/ganglia" target="_blank"&gt;https://github.com/databricks/containers/tree/master/experimental/ubuntu/ganglia&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jan 2023 12:38:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ganglia-not-working-with-custom-container-services/m-p/12722#M7487</guid>
      <dc:creator>Vivian_Wilfred</dc:creator>
      <dc:date>2023-01-11T12:38:38Z</dc:date>
    </item>
  </channel>
</rss>

