Hi @Kroy , When it comes to shared compute resources in Databricks, there are some best practices and options you can consider:
Shared Access Mode for Clusters:
All-Purpose Clusters:
Photon for Faster Queries:
Cluster Sizing Considerations:
- When creating clusters, choose the size of nodes and the number of workers based on the specific operations your workload performs.
- For example, if you expect frequent shuffles, using a large single node might be more efficient than multiple smaller nodes.
- Run VACUUM on a cluster with autoscaling set for 1-4 workers, where each worker has 8 cores. Adjust the driver size if you encounter out-of-memory errors during the vacuum process.
Job Clusters for Operationalization:
- Once youโve completed development and are ready to operationalize your code, switch to running it on job clusters.
- Job clusters terminate when the job ends, reducing resource usage and costs. They are ideal for orchestrated tasks.
Delta Sharing for Data Sharing:
Remember that while shared compute resources are possible, youโll need to carefully plan and configure your clusters based on your specific requirements and use cases.
Databricks provides flexibility, and with the right choices, you can achieve efficient resource utilization across multiple customers. ๐