Machine type for different operations in Azure Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-28-2025 05:10 AM
Dear all
do we have a general recommendation for the virtual machine type to be used for different operations in Azure Databricks? we are looking for the below -
1. VACUUM 2. OPTIMIZE 3. ANALYZE STATS 4. DESCRIBE TABLE HISTORY
I understood at a high level from the documentation that since VACUUM lists the files first which is a CPU intensive operation, it is advised to go for F series etc.
Appreciate if we can have the recommendation with some rationale. Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-28-2025 05:50 AM - edited 07-28-2025 05:52 AM
Hi @noorbasha534 ,
Here's a general recommendation from Databricks. So they're recommending to run OPTIMIZE on compute optimized VMs and VACUUM on general purpose.
Comprehensive Guide to Optimize Data Workloads | Databricks
But as you said, VACCUM is compute intensive operation, so if you run it on F series that is also good approach. They even recommended to use that type of compute below:
VACUUM best practices on Delta Lake - Databricks
As of ANALAYZE, this one collects metadata about the data, it's primarly I/O bound. General-purpose compute will be a good fit here in my opinion.