Machine type for different operations in Azure Databricks

noorbasha534 — Mon, 28 Jul 2025 12:10:46 GMT

Dear all

do we have a general recommendation for the virtual machine type to be used for different operations in Azure Databricks? we are looking for the below -

1. VACUUM 2. OPTIMIZE 3. ANALYZE STATS 4. DESCRIBE TABLE HISTORY

I understood at a high level from the documentation that since VACUUM lists the files first which is a CPU intensive operation, it is advised to go for F series etc.

Appreciate if we can have the recommendation with some rationale. Thanks

Re: Machine type for different operations in Azure Databricks

szymon_dybczak — Mon, 28 Jul 2025 12:52:15 GMT

Hi @noorbasha534 ,

Here's a general recommendation from Databricks. So they're recommending to run OPTIMIZE on compute optimized VMs and VACUUM on general purpose.

Comprehensive Guide to Optimize Data Workloads | Databricks

But as you said, VACCUM is compute intensive operation, so if you run it on F series that is also good approach. They even recommended to use that type of compute below:

VACUUM best practices on Delta Lake - Databricks

As of ANALAYZE, this one collects metadata about the data, it's primarly I/O bound. General-purpose compute will be a good fit here in my opinion.

topic Machine type for different operations in Azure Databricks in Data Engineering

Machine type for different operations in Azure Databricks

Re: Machine type for different operations in Azure Databricks