cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Cost of using delta sharing with unity catalog

RajNath
New Contributor II

I am new to databricks delta sharing. In case of delta sharing, i don't see any cluster running. Tried looking for documentation but only hint i got is, it usage delta sharing server but what is the cost of it and how to configure and optimize for large data sharing, any help or link for more details would be appreciated.

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @RajNathLet’s dive into the world of Delta Sharing and explore how it works, its cost implications, and optimization strategies.

What is Delta Sharing?

Delta Sharing is a secure data-sharing platform developed by Databricks. It allows you to share data and AI assets within and across organizations, regardless of the computing platforms they use.

Here are the three ways you can share data using Delta Sharing:

  1. Databricks-to-Databricks Sharing Protocol:

    • Share data and AI assets from your Unity Catalog-enabled workspace with users who also have access to a Unity Catalog-enabled Databricks workspace.
    • This approach uses the built-in Delta Sharing server within Databricks.
    • Supports features like notebook sharing, Unity Catalog volume sharing, AI model sharing, data governance, auditing, and usage tracking.
    • Ideal for sharing within Databricks environments.
    • Learn more.
  2. Databricks Open Sharing Protocol:

    • Share tabular data managed in a Unity Catalog-enabled Databricks workspace with users on any computing platform.
    • Useful when you manage data using Unity Catalog and want to share it with users who don’t use Databricks.
    • Also uses the built-in Delta Sharing server within Databricks.
    • Learn more.
  3. Customer-Managed Implementation:

    • Set up your own Delta Sharing server to share data from any platform to any platform, whether Databricks or not.
    • Provides flexibility but requires additional setup.
    • Not covered in Azure Databricks documentation.
    • Learn more.

Cost Considerations:

  • Egress Cost: When sharing data across clouds or regions, egress fees may apply. However, sharing within a region incurs no egress cost.
  • Provider Setup: Initial setup involves enabling Delta Sharing on a Unity Catalog metastore and configuring audits.
  • Recipient Token Lifetime: Set an appropriate recipient token lifetime for each metastore.
  • Rotating Credentials: Establish a process for rotating credentials.
  • Security Best Practices can help protect sensitive data.

Optimization Strategies:

  1. File Coalescing:

    • Use the OPTIMIZE command to coalesce small files into larger ones.
    • Improves read query performance.
    • Consider running it daily (preferably at night) to balance cost and performance.
    • Learn more.
  2. Predictive Optimization:

    • Automatically runs OPTIMIZE for Delta tables.
    • Learn more.
  3. Clustering:

    • Use clustering for Delta table layout (Databricks Runtime 13.3+).
    • Learn more.

Remember, Delta Sharing empowers secure data collaboration, and thoughtful configuration and optimization ensure efficient and cost-effective sharing. Feel free to explore the provided links for more details! 🚀🔗

 

RajNath
New Contributor II

Just to rephrase my uderstanding, Delta sharing cost has two parts

  1. Egress
  2. Initial setup?

What is the cost associated with initial setup? If i am not wrong, databricks costs are measured in DBUs but when i am done with data processing and storing in delta table, Will using Delta Sharing alone cost me (excluding Egress).