What is Delta Sharing?
Delta Sharing is a secure data-sharing platform developed by Databricks. It allows you to share data and AI assets within and across organizations, regardless of the computing platforms they use.
Here are the three ways you can share data using Delta Sharing:
-
Databricks-to-Databricks Sharing Protocol:
- Share data and AI assets from your Unity Catalog-enabled workspace with users who also have access to a Unity Catalog-enabled Databricks workspace.
- This approach uses the built-in Delta Sharing server within Databricks.
- Supports features like notebook sharing, Unity Catalog volume sharing, AI model sharing, data governance, auditing, and usage tracking.
- Ideal for sharing within Databricks environments.
- Learn more.
-
Databricks Open Sharing Protocol:
- Share tabular data managed in a Unity Catalog-enabled Databricks workspace with users on any computing platform.
- Useful when you manage data using Unity Catalog and want to share it with users who donโt use Databricks.
- Also uses the built-in Delta Sharing server within Databricks.
- Learn more.
-
Customer-Managed Implementation:
- Set up your own Delta Sharing server to share data from any platform to any platform, whether Databricks or not.
- Provides flexibility but requires additional setup.
- Not covered in Azure Databricks documentation.
- Learn more.
Cost Considerations:
- Egress Cost: When sharing data across clouds or regions, egress fees may apply. However, sharing within a region incurs no egress cost.
- Provider Setup: Initial setup involves enabling Delta Sharing on a Unity Catalog metastore and configuring audits.
- Recipient Token Lifetime: Set an appropriate recipient token lifetime for each metastore.
- Rotating Credentials: Establish a process for rotating credentials.
- Security Best Practices can help protect sensitive data.
Optimization Strategies:
-
File Coalescing:
- Use the
OPTIMIZE
command to coalesce small files into larger ones.
- Improves read query performance.
- Consider running it daily (preferably at night) to balance cost and performance.
- Learn more.
-
Predictive Optimization:
- Automatically runs
OPTIMIZE
for Delta tables.
- Learn more.
-
Clustering:
- Use clustering for Delta table layout (Databricks Runtime 13.3+).
- Learn more.
Remember, Delta Sharing empowers secure data collaboration, and thoughtful configuration and optimization ensure efficient and cost-effective sharing. Feel free to explore the provided links for more details! ๐๐