cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
derek_sun
Databricks Employee
Databricks Employee

Data sharing among collaborators has become increasingly important across industries, but it must be done in compliance with data protection regulations like CCPA and GDPR. Databricks offers two key solutions to enable secure and compliant data collaboration: Delta Sharing and Databricks Clean Rooms. The choice between Delta Sharing and Clean Rooms depends on specific sharing requirements. Let’s dive in and figure out which choice is right for you!

What is Delta Sharing?

Delta Sharing is an open protocol for secure data sharing across different platforms and clouds. It allows organizations to share large datasets without data duplication, providing real-time access and centralized governance. 

Screenshot 2024-10-23 at 12.29.01 PM.png

By leveraging features like Partition Sharing, Views Sharing and Dynamic Views Sharing, you can scale your data sharing capabilities to match your business growth, ensuring security, flexibility, and compliance along the way. The difference between these features is as follows.

  1. Partition Sharing: Allows sharing specific subsets of large datasets, reducing data transfer and improving performance as your data volume grows.
  2. Views Sharing: Enables sharing of curated data views without exposing underlying tables, facilitating easier management of shared data as your offerings expand.
  3. Dynamic Views Sharing: Provides fine-grained access control and data masking, scaling your ability to customize data access for different partners or customers.

There are some differences between Views Sharing and Dynamic Views Sharing:

  1. Platform Requirements:
    • Views Sharing: Shareable across different platforms and cloud regions.
    • Dynamic Views Sharing: Requires both provider and recipient to be Databricks customers.
  2. Implementation:
    • Views Sharing: Uses standard SQL views, implementable across various database systems.
    • Dynamic Views Sharing: Employs complex logic with Databricks-specific features.
  3. Access Control:
    • Views Sharing: Offers read-only access to predefined data views.
    • Dynamic Views Sharing: Provides fine-grained control, including row-level and column-level security.
  4. Flexibility:
    • Views Sharing: Static view definition, identical for all recipients.
    • Dynamic Views Sharing: Adjusts visible data dynamically based on recipient or conditions.
  5. Recipient-Specific Customization:
    • Views Sharing: No support for recipient-specific functions.
    • Dynamic Views Sharing: Utilizes CURRENT_RECIPIENT() for customized data access per recipient.

What is missing in Delta Sharing?

Delta Sharing has introduced powerful features like partition sharing, views sharing, and dynamic views sharing, which provide granular control over data sharing. However, there's still a limitation: the recipient has access to direct, raw data. This can be a concern in scenarios where data owners need to collaborate on sensitive data without exposing the underlying raw information.

This is where Clean Rooms come into play. 

What are Databricks Clean Rooms?

Databricks Clean Rooms are secure, privacy-preserving environments that enable organizations to collaborate on sensitive data without compromising confidentiality or regulatory compliance. Clean Rooms create a central isolated environment managed by Databricks where collaborators can run computations on shared data without directly accessing each other's raw data. Delta Sharing alone does not provide this isolated workspace.

Multi-party collaboration

Clean Rooms enable multiple parties to collaborate securely in a shared environment, whereas Delta Sharing is primarily focused on one-to-one, provider to recipient, data sharing.

Complex computations

Clean Rooms allow running complex computations, including machine learning and AI workloads, using various programming languages like SQL, Python, R, Scala, and Java. Delta Sharing is more focused on data sharing rather than collaborative analysis.

Notebook sharing and execution

Collaborators can share and run approved notebooks within the Clean Room environment. This is not a feature of Delta Sharing alone.

Privacy controls and approval processes

Clean Rooms implement strict privacy controls, including requiring all collaborators to approve notebooks before execution. Delta Sharing does not inherently include these collaborative approval processes.

Output data in Unity Catalog

Clean Rooms allow collaborators to get approved output data directly in their Unity Catalog for subsequent use cases. This integrated output management is not a feature of Delta Sharing alone.

Volume Sharing of Data and AI Models with Privacy and Control

Volume sharing in Databricks Clean Room enables secure sharing of non-tabular data such as PDFs, images, videos, audio files and execution of ML/AI models. Databricks Clean Rooms with volume sharing unlock a wide range of data collaboration use cases across industries where Clean Room collaborators would like to collaborate and share AI/BI output but keep their Data and AI models private.

In summary, while Delta Sharing provides the foundation for secure data sharing, Databricks Clean Rooms build upon this to create a comprehensive, privacy-safe collaborative environment with additional features for analysis, computation, and workflow management.

Conclusion

Overall, the combination of Delta Sharing with Partition Sharing, Views Sharing, Dynamic Views Sharing, and Clean Rooms provides a comprehensive, secure, and flexible solution for data sharing and collaboration across organizations, while maintaining strong governance and privacy controls. This ecosystem empowers businesses to unlock the full potential of their data assets, fostering innovation and insights through secure collaboration, all while adhering to stringent data protection standards. As organizations continue to navigate the complexities of data sharing in an increasingly interconnected world, these technologies offer a robust framework for responsible and efficient data collaboration.

Stay tuned for part two of our series, where we'll dive deeper into the practical aspects of these sharing experiences. We'll walk you through how to set up each sharing method end-to-end, from the perspectives of providers, recipients, and collaborators. This hands-on guide will help you implement these powerful data sharing capabilities in your own organization, enabling you to fully leverage the benefits of secure and scalable data collaboration.

2 Comments
Brahmareddy
Honored Contributor

Well articulated @derek_sun. Thanks for sharing. Waiting for part 2 series.

 

sarath2
New Contributor

Do we have part 2 series for this one?