<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Full Memory Utilization in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/full-memory-utilization/m-p/122639#M4129</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132535"&gt;@Raghavan93513&lt;/a&gt;&amp;nbsp;, Let me know if any spark.conf I can set or something else which will help me to utilize more proportion of memory instead of limiting itself. Note: this is pandas workflow (not using spark till now)&lt;/P&gt;</description>
    <pubDate>Tue, 24 Jun 2025 09:14:58 GMT</pubDate>
    <dc:creator>harishgehlot_03</dc:creator>
    <dc:date>2025-06-24T09:14:58Z</dc:date>
    <item>
      <title>Full Memory Utilization</title>
      <link>https://community.databricks.com/t5/machine-learning/full-memory-utilization/m-p/121986#M4120</link>
      <description>&lt;P&gt;Hi Databricks Community. I need some suggestions on my issue. Basically we are using databricks asset bundle to deploy our forecasting repo and using aws nodes to run the forecast jobs. We built proper workflow.yml file to trigger the jobs.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;I am using single node cluster because currently our forecasting module is pandas based only (no spark or distribution but we are using joblib parallel).&lt;/LI&gt;&lt;LI&gt;Right now we've used r6i.xlarge node which is (32 GB &amp;amp; 4 cores). When we are running using this node, our code is do utilizing 28 - 30 GB and keeping remaining as free. This job took 15 hours to complete.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="harishgehlot_03_0-1750165393984.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17571iC5AC6CD2BD70E8FE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="harishgehlot_03_0-1750165393984.png" alt="harishgehlot_03_0-1750165393984.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Now, I've switched to r6i.4xlarge (128 GB &amp;amp; 64 cores) and I am expecting, it will run more faster as early with r6i.xlarge, BUT WHAT I OBSERVED is it's still taking around 30-31 GB only and other 90 GB is free. What I am expecting is it should expand and completes the job more faster.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="harishgehlot_03_1-1750165586159.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17572iACE8EE9E2602B0A5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="harishgehlot_03_1-1750165586159.png" alt="harishgehlot_03_1-1750165586159.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is my workflow and cluster settings being used. Let me know if there is something needs to be change or tuned. Tagging&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/154481"&gt;@Shua42&lt;/a&gt;&amp;nbsp;, because you also helped me before. Thanks in advance.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;  dev:
    resources:
      clusters:
        dev_cluster: &amp;amp;dev_cluster
          num_workers: 0
          kind: CLASSIC_PREVIEW
          is_single_node: true
          spark_version: 14.3.x-scala2.12
          node_type_id: r6i.4xlarge
          custom_tags:
            clusterSource: ts-forecasting-2
            ResourceClass: SingleNode
          data_security_mode: SINGLE_USER
          enable_elastic_disk: true
          enable_local_disk_encryption: false
          autotermination_minutes: 20
          docker_image:
            url: "*****.amazonaws.com/dev-databricks:retailforecasting-latest"
          aws_attributes:
            availability: SPOT
            instance_profile_arn: ****
            ebs_volume_type: GENERAL_PURPOSE_SSD
            ebs_volume_count: 1
            ebs_volume_size: 50
          spark_conf:
            spark.databricks.cluster.profile: singleNode
            spark.memory.offHeap.enabled: false
            spark.driver.memory: 4g&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jun 2025 13:11:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/full-memory-utilization/m-p/121986#M4120</guid>
      <dc:creator>harishgehlot_03</dc:creator>
      <dc:date>2025-06-17T13:11:09Z</dc:date>
    </item>
    <item>
      <title>Re: Full Memory Utilization</title>
      <link>https://community.databricks.com/t5/machine-learning/full-memory-utilization/m-p/122492#M4127</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/170190"&gt;@harishgehlot_03&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Good day!&lt;/P&gt;
&lt;P&gt;May I know what the time was in the second case using a r6i.4xlarge instance type?&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 06:34:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/full-memory-utilization/m-p/122492#M4127</guid>
      <dc:creator>Raghavan93513</dc:creator>
      <dc:date>2025-06-23T06:34:33Z</dc:date>
    </item>
    <item>
      <title>Re: Full Memory Utilization</title>
      <link>https://community.databricks.com/t5/machine-learning/full-memory-utilization/m-p/122493#M4128</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132535"&gt;@Raghavan93513&lt;/a&gt;, thanks for responding. Time taken by second case is ~14 hours.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 06:41:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/full-memory-utilization/m-p/122493#M4128</guid>
      <dc:creator>harishgehlot_03</dc:creator>
      <dc:date>2025-06-23T06:41:32Z</dc:date>
    </item>
    <item>
      <title>Re: Full Memory Utilization</title>
      <link>https://community.databricks.com/t5/machine-learning/full-memory-utilization/m-p/122639#M4129</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132535"&gt;@Raghavan93513&lt;/a&gt;&amp;nbsp;, Let me know if any spark.conf I can set or something else which will help me to utilize more proportion of memory instead of limiting itself. Note: this is pandas workflow (not using spark till now)&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jun 2025 09:14:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/full-memory-utilization/m-p/122639#M4129</guid>
      <dc:creator>harishgehlot_03</dc:creator>
      <dc:date>2025-06-24T09:14:58Z</dc:date>
    </item>
  </channel>
</rss>

