Hello @dataailearner ,
Greetings of the day!
Here are a few steps that you can follow for cost optimizations:
1. Choose the most efficient compute size: Databricks runs one executor per worker node. The total number of cores across all executors is an essential factor to consider for cost optimisation.
2. Dynamically allocate resources: With autoscaling, Databricks dynamically reallocates workers to account for the characteristics of your job. Autoscaling can reduce overall costs compared to a statically sized compute instance. However, scaling down cluster size for Structured Streaming workloads has limitations. For such cases, Databricks recommends using Delta Live Tables with Enhanced Autoscaling.
3. Use auto termination: Configure auto termination for all interactive compute resources. After a specified idle time, the compute resource shuts down. This can help control costs by reducing idle resources. For use cases where compute is needed only during business hours, compute resources can be configured with auto termination, and a scheduled process can restart compute in the morning before users are back at their desktops. If compute startup times are too long, consider using cluster pools.
4. Use compute policies to control costs: Compute policies can enforce many cost-specific restrictions for compute resources. For example, you can enable cluster autoscaling with a set minimum number of worker nodes, enable cluster auto termination with a reasonable value (for, e.g., 1 hour) to avoid paying for idle times and ensure that only cost-efficient VM instances can be selected.
5. Balance between on-demand and capacity excess instances: Spot instances take advantage of excess virtual machine resources in the cloud that are available at a lower price. To save costs, Databricks supports creating clusters using spot instances. It is recommended that the first instance (the Spark driver) should always be an on-demand virtual machine.
Please feel free top refer below doc:
https://docs.databricks.com/en/lakehouse-architecture/cost-optimization/best-practices.html
Regards,
Ravi