<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Machine Type for VACUUM operation in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/machine-type-for-vacuum-operation/m-p/63595#M32291</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;I request you to read my post carefully once again to better understand my problem statement; may be, that will lead to a more meaningful discussion beneficial for all. I already said I do VACUUM after OPTIMIZE. I already said I use F series. I already said I use F64 machine for workers with 8 workers in auto-scale mode.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 13 Mar 2024 18:09:36 GMT</pubDate>
    <dc:creator>NOOR_BASHASHAIK</dc:creator>
    <dc:date>2024-03-13T18:09:36Z</dc:date>
    <item>
      <title>Machine Type for VACUUM operation</title>
      <link>https://community.databricks.com/t5/data-engineering/machine-type-for-vacuum-operation/m-p/63425#M32233</link>
      <description>&lt;P&gt;Dear all&lt;/P&gt;&lt;P&gt;I have a workflow with 2 tasks : one that does OPTIMIZE, followed by one that does VACUUM. I used a cluster with F32s driver and F64s - 8 workers (auto-scaling enabled). All 8 workers are launched by Databricks as soon as OPTIMIZE starts.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;As per documentation, we should use F series machines for OPTIMIZE &amp;amp; VACUUM operations as they are compute intensive. But, when I use F series, during the whole VACUUM step execution time, CPU is barely used, both on driver side as well worker side. Driver side, it is little bit high - around 30% - as it does I think the actual delete operation of files. In contrast, I have memory touch 50% both for driver worker &amp;amp; driver nodes.&lt;/DIV&gt;&lt;DIV&gt;You can notice from below screenshot (captured for one of the workers buts same pattern for rest) that CPU usage suddenly goes down but memory is used to a decent extent. This is when VACUUM has started.&amp;nbsp; For OPTIMIZE step, CPU &amp;amp; Memory are well used both on worker and driver nodes. I expect Databricks to scale down when VACUUM started as the hardware is not fully used but may be it does not as memory is used well (only CPU is idle)....&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Please advise the best set-up here.&lt;/DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="NOOR_BASHASHAIK_0-1710268182562.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6624iD570198A9F700B45/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="NOOR_BASHASHAIK_0-1710268182562.png" alt="NOOR_BASHASHAIK_0-1710268182562.png" /&gt;&lt;/span&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 12 Mar 2024 18:41:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/machine-type-for-vacuum-operation/m-p/63425#M32233</guid>
      <dc:creator>NOOR_BASHASHAIK</dc:creator>
      <dc:date>2024-03-12T18:41:03Z</dc:date>
    </item>
    <item>
      <title>Re: Machine Type for VACUUM operation</title>
      <link>https://community.databricks.com/t5/data-engineering/machine-type-for-vacuum-operation/m-p/63595#M32291</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;I request you to read my post carefully once again to better understand my problem statement; may be, that will lead to a more meaningful discussion beneficial for all. I already said I do VACUUM after OPTIMIZE. I already said I use F series. I already said I use F64 machine for workers with 8 workers in auto-scale mode.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Mar 2024 18:09:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/machine-type-for-vacuum-operation/m-p/63595#M32291</guid>
      <dc:creator>NOOR_BASHASHAIK</dc:creator>
      <dc:date>2024-03-13T18:09:36Z</dc:date>
    </item>
    <item>
      <title>Re: Machine Type for VACUUM operation</title>
      <link>https://community.databricks.com/t5/data-engineering/machine-type-for-vacuum-operation/m-p/67548#M33366</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;were you able to get any useful help on this?&lt;/P&gt;</description>
      <pubDate>Mon, 29 Apr 2024 08:28:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/machine-type-for-vacuum-operation/m-p/67548#M33366</guid>
      <dc:creator>ArturOA</dc:creator>
      <dc:date>2024-04-29T08:28:04Z</dc:date>
    </item>
  </channel>
</rss>

