Databricks Community

Kroy · ‎12-12-2023

We are trying to do POC to have shared resource like compute across multiple customer, Storage will be different, Is this possible ?

Kaniz_Fatma · ‎12-12-2023

Hi @Kroy , When it comes to shared compute resources in Databricks, there are some best practices and options you can consider:

Shared Access Mode for Clusters:

Databricks allows you to create clusters in shared access mode. This means that multiple users can utilize the same computing resources while maintaining data isolation among users. It’s a great choice when you want to share compute capacity across different customers or teams.
By using shared access mode, you can improve efficiency and reduce costs by maximizing resource utilization.

All-Purpose Clusters:

Start with general all-purpose instance types for new users. These instance types are suitable for various workloads and provide a good starting point.
As your workloads become more specific, consider selecting instance types optimized for your use cas....

Photon for Faster Queries:

If your batch workflows involve queries, consider using Photon. It provides faster query execution and can help reduce overall workload costs.

Cluster Sizing Considerations:

When creating clusters, choose the size of nodes and the number of workers based on the specific operations your workload performs.
For example, if you expect frequent shuffles, using a large single node might be more efficient than multiple smaller nodes.
Run VACUUM on a cluster with autoscaling set for 1-4 workers, where each worker has 8 cores. Adjust the driver size if you encounter out-of-memory errors during the vacuum process.

Job Clusters for Operationalization:

Once you’ve completed development and are ready to operationalize your code, switch to running it on job clusters.
Job clusters terminate when the job ends, reducing resource usage and costs. They are ideal for orchestrated tasks.

Delta Sharing for Data Sharing:

If you want to share data securely with users across different Databricks workspaces (even across AWS, Azure, or GCP), consider using Delta Sharing.
It allows you to share data with users who have Databricks workspaces enabled for Unity Catalog.

Remember that while shared compute resources are possible, you’ll need to carefully plan and configure your clusters based on your specific requirements and use cases.

Databricks provides flexibility, and with the right choices, you can achieve efficient resource utilization across multiple customers. 🚀

View solution in original post

Kaniz_Fatma · ‎12-12-2023