<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Machine type for different operations in Azure Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/machine-type-for-different-operations-in-azure-databricks/m-p/126693#M47740</link>
    <description>&lt;P&gt;Dear all&lt;/P&gt;&lt;P&gt;do we have a general recommendation for the virtual machine type to be used for different operations in Azure Databricks? we are looking for the below -&lt;/P&gt;&lt;P&gt;1. VACUUM 2. OPTIMIZE 3. ANALYZE STATS 4. DESCRIBE TABLE HISTORY&lt;/P&gt;&lt;P&gt;I understood at a high level from the documentation that since VACUUM lists the files first which is a CPU intensive operation, it is advised to go for F series etc.&lt;/P&gt;&lt;P&gt;Appreciate if we can have the recommendation with some rationale. Thanks&lt;/P&gt;</description>
    <pubDate>Mon, 28 Jul 2025 12:10:46 GMT</pubDate>
    <dc:creator>noorbasha534</dc:creator>
    <dc:date>2025-07-28T12:10:46Z</dc:date>
    <item>
      <title>Machine type for different operations in Azure Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/machine-type-for-different-operations-in-azure-databricks/m-p/126693#M47740</link>
      <description>&lt;P&gt;Dear all&lt;/P&gt;&lt;P&gt;do we have a general recommendation for the virtual machine type to be used for different operations in Azure Databricks? we are looking for the below -&lt;/P&gt;&lt;P&gt;1. VACUUM 2. OPTIMIZE 3. ANALYZE STATS 4. DESCRIBE TABLE HISTORY&lt;/P&gt;&lt;P&gt;I understood at a high level from the documentation that since VACUUM lists the files first which is a CPU intensive operation, it is advised to go for F series etc.&lt;/P&gt;&lt;P&gt;Appreciate if we can have the recommendation with some rationale. Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 28 Jul 2025 12:10:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/machine-type-for-different-operations-in-azure-databricks/m-p/126693#M47740</guid>
      <dc:creator>noorbasha534</dc:creator>
      <dc:date>2025-07-28T12:10:46Z</dc:date>
    </item>
    <item>
      <title>Re: Machine type for different operations in Azure Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/machine-type-for-different-operations-in-azure-databricks/m-p/126698#M47742</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/124839"&gt;@noorbasha534&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Here's a general recommendation from Databricks. So they're recommending to run OPTIMIZE on compute optimized VMs and VACUUM on general purpose.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.databricks.com/discover/pages/optimize-data-workloads-guide#databricks-cluster" target="_blank" rel="noopener"&gt;Comprehensive Guide to Optimize Data Workloads | Databricks&lt;/A&gt;&lt;/P&gt;&lt;P&gt;But as you said, VACCUM is compute intensive operation, so if you run it on F series that is also good approach. They even recommended to use that type of compute below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1753707124150.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/18576iAF82815DA6E6A2D8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1753707124150.png" alt="szymon_dybczak_0-1753707124150.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="https://kb.databricks.com/delta/vacuum-best-practices-on-delta-lake" target="_blank" rel="noopener"&gt;VACUUM best practices on Delta Lake - Databricks&lt;/A&gt;&lt;/P&gt;&lt;P&gt;As of ANALAYZE, this one collects metadata about the data, it's primarly I/O bound. General-purpose compute will be a good fit here in my opinion.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Jul 2025 12:52:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/machine-type-for-different-operations-in-azure-databricks/m-p/126698#M47742</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-28T12:52:15Z</dc:date>
    </item>
  </channel>
</rss>

