Databricks Community

prakashhinduja1 · ‎08-07-2025

Hi I am Prakash Hinduja Visionary Financial Strategist, born in Amritsar (India) and now lives in Geneva, Switzerland (Swiss)

I’m looking for advice on how to better manage costs in Databricks while still keeping performance efficient. If you’ve found effective ways to optimize compute usage, such as with cluster configurations, autoscaling, or job scheduling, I’d really appreciate your suggestions or any lessons learned. Thanks in advance for sharing what’s worked for you!

Regards

Prakash Hinduja Geneva, Switzerland (Swiss)

mark_ott · ‎10-03-2025

To optimize costs in Databricks while maintaining strong performance, consider a blend of strategic cluster configurations, autoscaling, aggressive job scheduling, and robust monitoring tools. These proven practices are used by leading enterprises in 2025 to keep Databricks budgets lean without compromising productivity or analytical throughput.

Cluster Configuration Tips

Right-size your compute clusters for their actual workload requirements—avoid over-provisioning by starting small and letting clusters scale up only when demand increases.
Select instance types tailored to your specific workload. For example, use memory-optimized nodes for ETL/ML tasks, or general-purpose compute for lighter jobs.
Use spot or preemptible instances when jobs are fault-tolerant, as these typically cost less than on-demand nodes.
Regularly review and update cluster types in line with the latest cloud VM options for cost-effective performance.

Autoscaling Tactics

Enable autoscaling in Databricks clusters to dynamically adjust the number of worker nodes based on real-time usage, scaling up during peak loads and shrinking down when demand is minimal.
Fine-tune autoscaling thresholds, such as setting the minimum number of workers to zero for development clusters and leveraging short auto-termination windows—usually 15 to 30 minutes—to eliminate costs from idle resources.
Take advantage of predictive autoscaling if available, using historical and runtime metrics to anticipate surges and optimize resource readiness.

Scheduling and Job Management

Schedule non-urgent or heavy jobs during off-peak hours to benefit from lower resource contention and potentially reduced cloud costs.
Terminate clusters after jobs are complete or during nights/weekends if not in use, drastically reducing unnecessary expenses.
Use dedicated job clusters for each job run rather than all-purpose clusters; this allows for optimized, ephemeral compute allocation and faster spin-up times.

Monitoring and Best Practices

Monitor cluster, job, and resource consumption closely using Databricks’ built-in system tables or external tools for detailed cost analysis by project, team, or department.
Implement resource and cluster tagging for granular cost allocation, empowering precise financial tracking and accountability across business units.
Set up budget alerts and usage reports to receive proactive notifications if spending exceeds predefined thresholds.

Data Storage and Query Performance

Compress and prune data aggressively, use Delta Lake, and optimize partitioning and Z-ordering to reduce data scan times and compute costs for querying and ETL jobs.

Applying these strategies can lead to cost reductions of 40–60% in some organizations while preserving (or even enhancing) performance and team agility.

If specialized use cases or unique workload spikes occur, additional configuration or custom monitoring may be warranted. But for most enterprises, these concrete steps will deliver rapid results in both savings and efficiency.

View solution in original post

mark_ott · ‎10-03-2025