cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

E series vs F series VM's

Sainath368
New Contributor III

Hi all,
I need to run weekly maintenance on approximately 7,000 tables in my Databricks environment, involving OPTIMIZE, VACUUM, and ANALYZE TABLE (for statistics calculation) on all tables.

My question is: between the Ev4, Edv4, and Fsv2 VM series, which would be best suited for the driver and worker nodes in a Databricks cluster handling this workload, especially considering time constraints?

Iโ€™m looking for recommendations on the VM series that would minimize task completion times while balancing cost and resource efficiency.

1 ACCEPTED SOLUTION

Accepted Solutions

mani_22
Databricks Employee
Databricks Employee

@Sainath368  OPTIMIZE and VACUUM are compute-intensive operations, so you can choose a compute-optimized instance like the F series for both drivers and workers, which has a higher CPU-to-memory ratio.

If its UC managed table, I recommend enabling Predictive optimization, which automatically runs VACUUM, OPTIMIZE and ANALYZE on a serverless compute.

Documentation: https://docs.databricks.com/aws/en/optimizations/predictive-optimization

View solution in original post

1 REPLY 1

mani_22
Databricks Employee
Databricks Employee

@Sainath368  OPTIMIZE and VACUUM are compute-intensive operations, so you can choose a compute-optimized instance like the F series for both drivers and workers, which has a higher CPU-to-memory ratio.

If its UC managed table, I recommend enabling Predictive optimization, which automatically runs VACUUM, OPTIMIZE and ANALYZE on a serverless compute.

Documentation: https://docs.databricks.com/aws/en/optimizations/predictive-optimization