Databricks Community

sashikanth · ‎10-18-2024

How to increase the resource efficiency in databricks jobs?

We see that idle cost is more than the utilization cost. Any guidelines will be helpful

Please share some examples.

shashank853 · ‎10-18-2024

Hi,

You can check below components for Managing Idle Costs:

Auto-scaling and Auto-termination:

Auto-scaling: Enable auto-scaling to dynamically adjust the number of worker nodes based on job requirements. This helps in scaling up during high demand and scaling down during low demand.
Auto-termination: Configure clusters to automatically terminate after a set period of inactivity. This prevents idle clusters from incurring unnecessary costs.

Use Job Compute:

Job Compute vs. All-Purpose Compute: Running non-interactive workloads on job compute instances is more cost-effective than using all-purpose compute instances.

Choose the Right Instance Type

Instance Type Selection: Select instance types based on workload characteristics. For example, use memory-optimized instances for ML tasks and compute-optimized instances for streaming workloads.

Efficient Compute Size:

Compute Sizing Considerations: Consider factors like total executor cores, memory, and local storage when sizing your compute. This ensures optimal resource utilization and cost efficiency.

Design Cost-effective Workloads:

Balance Always-on and Triggered Streaming: For use cases that do not require immediate data updates, schedule fewer runs to reduce costs.

Check the doc: https://docs.databricks.com/en/lakehouse-architecture/cost-optimization/best-practices.html

-werners- · ‎10-18-2024

My main improvements are:

- use singlenode job clusters for small data
- cluster reuse (so use the same job cluster for multiple tasks, in parallel or serial)
- use autoscaling only when it is very hard to find a good fixed sizing, otherwise go for fixed size.

Databricks Community

Job optimization

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!