12-05-2025 09:29 AM
Hi everyone,
I'm working on optimizing Databricks costs for a production-grade data pipeline (Spark + Delta Lake) on Azure. I’m looking for practical, field-tested strategies to reduce compute and storage spend without impacting performance.
So far, I’ve explored:
Auto-Optimize and Auto-Compact
Delta caching
Photon where supported
Spot instances (limited due to stability concerns)
Questions:
What are the most impactful cost optimizations you’ve applied in real-world pipelines?
Do you prefer Jobs clusters or All-purpose clusters for cost efficiency?
Any best practices for minimizing storage costs with Delta Lake (versioning, retention, vacuum, etc.)?
How do you tune cluster size smartly to avoid over-provisioning?
Any monitoring tools or dashboards you recommend for ongoing cost governance?
Any detailed recommendations, examples, or references would be super helpful.
Thanks!
12-05-2025 10:25 PM
Hello @Poorva21 ,
Below are the answers to your questions:
Q1. What are the most impactful cost optimisations for production pipelines?
I have worked with multiple Cx and based on my knowledge, below are a high-level optimisations one must have:
Q2. Jobs clusters vs all-purpose clusters: which is more cost-efficient?
Q3. How do I minimise storage costs with Delta Lake (versioning, retention, VACUUM, etc.)?
Q4. How do I tune cluster size smartly to avoid over-provisioning?
Q5. What monitoring tools or dashboards should I use for ongoing cost governance?
Please let me know if you have any further questions. Additionally, if you find this answer helpful, please accept it as a solution.
12-05-2025 10:25 PM
Hello @Poorva21 ,
Below are the answers to your questions:
Q1. What are the most impactful cost optimisations for production pipelines?
I have worked with multiple Cx and based on my knowledge, below are a high-level optimisations one must have:
Q2. Jobs clusters vs all-purpose clusters: which is more cost-efficient?
Q3. How do I minimise storage costs with Delta Lake (versioning, retention, VACUUM, etc.)?
Q4. How do I tune cluster size smartly to avoid over-provisioning?
Q5. What monitoring tools or dashboards should I use for ongoing cost governance?
Please let me know if you have any further questions. Additionally, if you find this answer helpful, please accept it as a solution.