cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Job optimization

sashikanth
New Contributor

How to increase the resource efficiency in databricks jobs?

We see that idle cost is more than the utilization cost. Any guidelines will be helpful

 Please share some examples.

2 REPLIES 2

shashank853
Databricks Employee
Databricks Employee

Hi,

You can check below components for Managing Idle Costs:

Auto-scaling and Auto-termination:

Auto-scaling: Enable auto-scaling to dynamically adjust the number of worker nodes based on job requirements. This helps in scaling up during high demand and scaling down during low demand.
Auto-termination: Configure clusters to automatically terminate after a set period of inactivity. This prevents idle clusters from incurring unnecessary costs.

Use Job Compute:

Job Compute vs. All-Purpose Compute: Running non-interactive workloads on job compute instances is more cost-effective than using all-purpose compute instances.

Choose the Right Instance Type

Instance Type Selection: Select instance types based on workload characteristics. For example, use memory-optimized instances for ML tasks and compute-optimized instances for streaming workloads.

Efficient Compute Size:

Compute Sizing Considerations: Consider factors like total executor cores, memory, and local storage when sizing your compute. This ensures optimal resource utilization and cost efficiency.

Design Cost-effective Workloads:

Balance Always-on and Triggered Streaming: For use cases that do not require immediate data updates, schedule fewer runs to reduce costs.

Check the doc: https://docs.databricks.com/en/lakehouse-architecture/cost-optimization/best-practices.html

-werners-
Esteemed Contributor III

My main improvements are:

- use singlenode job clusters for small data
- cluster reuse (so use the same job cluster for multiple tasks, in parallel or serial)
- use autoscaling only when it is very hard to find a good fixed sizing, otherwise go for fixed size.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group