The future of data and AI sharing should be open, secure, and seamless not locked into costly proprietary systems.
What is Delta Sharing?
Delta Sharing is the world’s first open protocol for secure, real-time data sharing at scale, developed by Databricks and The Linux Foundation. It allows you to share live data directly from your lakehouse without replication or movement. Recipients always see the latest version of the data and can access it using a wide range of tools, including:
- Azure Databricks
- Apache Spark
- Pandas
- Power BI
Open Delta Sharing?
If you want to share data with users outside your Databricks workspace, regardless of the platform they use, Open Delta Sharing provides secure, flexible options for authentication:
- Bearer Token: Generate a long-lived token and share it securely. Recipients use it to access the tables you’ve shared.
- OpenID Connect (OIDC) Federation: Issue short-lived Databricks OAuth tokens in exchange for JWT tokens from the recipient’s identity provider.
Press enter or click to view image in full size 𝗪𝗵𝘆 𝗶𝘁 𝗠𝗮𝘁𝘁𝗲𝗿𝘀
1. Supports open formats: Delta Lake, Apache Parquet, Apache Iceberg™.
2. Reduces time-to-value: query, transform, or enrich shared data with your favorite tools.
3. Truly open and scalable: share across clouds, platforms, and even on-premises.
Did you know that many teams traditionally relied on External Tables just to share data with other teams? With Databricks Managed Tables, there’s a more optimized way to handle data. Not only can you manage your data efficiently, but you can also share it faster with others, eliminating the time and cost of data copying.
Press enter or click to view image in full size 𝗛𝗼𝘄 𝗗𝗲𝗹𝘁𝗮 𝗦𝗵𝗮𝗿𝗶𝗻𝗴 𝗪𝗼𝗿𝗸𝘀
Authentication & Request: Client authenticates and requests a dataset (filters optional, e.g., country=US).
Authorization & Logging: Server checks access, logs requests, and identifies which files to share.
Temporary Access via Presigned URLs: Short-lived, read-only links point directly to files in cloud storage (AWS S3, Azure ADLS, GCP GCS). 𝗥𝗲𝘀𝘂𝗹𝘁: Recipients download data directly at high speed, providers maintain governance, and the server never becomes a bottleneck.
𝗞𝗲𝘆 𝗕𝗲𝗻𝗲𝗳𝗶𝘁𝘀
Open & Cross-Platform: Share across clouds, platforms, and on-premises with open-source connectors (pandas, Spark, Python, Elixir, and more). Secure & Live: Real-time access without replication.
Centralized Governance: Unity Catalog ensures fine-grained access, auditing, and compliance.
Beyond Tables: Share AI models, streams, dashboards, notebooks, and unstructured data, not just tabular data.
Lower Costs: Avoid duplication, reduce egress fees, and leverage efficient cloud storage.
Faster Time-to-Value: Query fresh data instantly, no extra ingestion pipelines needed.
Enables Databricks Marketplace and Clean Rooms: Facilitates privacy preserving data sharing and AI collaboration. With Delta Sharing, organizations can collaborate securely and efficiently, delivering real-time data access while maintaining control and compliance.
𝗧𝗼𝗽 𝟯 𝘂𝘀𝗲𝗰𝗮𝘀𝗲:
𝟭. 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝗹 𝗟𝗶𝗻𝗲 𝗼𝗳 𝗕𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗦𝗵𝗮𝗿𝗶𝗻g: Build Data Mesh with Delta Sharing to securely share data with business units and subsidiaries across clouds or regions without copying or replicating the data.
Press enter or click to view image in full size 𝟮. 𝗕𝟮𝗕 𝗦𝗵𝗮𝗿𝗶𝗻𝗴: Securely share data with your partners and suppliers without requiring them to be on the Databricks Platform.
Press enter or click to view image in full size 𝟯. 𝗗𝗮𝘁𝗮 𝗠𝗼𝗻𝗲𝘁𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Distribute and monetize data products datasets, machine learning models and dashboards without the need for customers to be on the Databricks Platform.
Press enter or click to view image in full size Databricks supports four primary approaches for sharing data:
1. Databricks-to-Databricks Delta Sharing (For Providers)
- Share data from your Unity Catalog-enabled workspace with users who also have access to a Unity Catalog-enabled workspace.
- Benefits include built-in Delta Sharing server support, Unity Catalog governance, auditing, and usage tracking.
- Simplifies setup and improves performance for both providers and recipients.
High-level steps for sharing data securely:
- Recipient Provides Sharing Identifier: A recipient gives the provider a unique sharing identifier from their Unity Catalog metastore. This identifier enables secure access to the provider’s shared data.
- Provider Creates a Share: In the provider’s Unity Catalog metastore, create a share containing tables, views, volumes, and notebooks.
- Provider Creates a Recipient Object: Represents the user or group accessing the share. Includes the recipient’s Unity Catalog sharing identifier to establish a secure connection.
- Grant Access to the Share: Provide the recipient access to the share.
- Access in Recipient Workspace: Users can access shared data via Catalog Explorer, Databricks CLI, SQL commands in notebooks, or Databricks SQL editor. Metastore admins or privileged users must create a catalog from the share and assign read access. Shared notebooks are accessible to anyone with the USE CATALOG privilege on the catalog.
Improve Table Read Performance with History Sharing
- Enable history sharing for tables to improve read performance.
- Uses temporary, scoped-down security credentials, resulting in performance similar to direct table access.
- New table shares: Specify WITH HISTORY when creating the share (default for Databricks Runtime 16.2+).
- Existing shares: Alter the share to include table history.
- Entire schema shares: Tables are shared with history by default.
Note: Partitioned tables do not benefit from history sharing.
Data Privacy Considerations
- History sharing grants temporary read access to both data files and Delta logs.
- Delta logs include commit history, committer info, and deleted data that has not been vacuumed.
2. Databricks Open Sharing Protocol
- Share data with users on any computing platform, not limited to Databricks.
High-level workflow for sharing data via the open sharing protocol:
- Create a Recipient: Represents a user or group of users you want to share data with.
- Set up authentication using either: Bearer Token: Generate a long-lived token, a credential file, and an activation link to securely share with recipients. OIDC Federation: Recipients authenticate via their identity provider (IdP) using short-lived tokens. See: Bearer Tokens and OIDC Federation.
- Create a Share: A named object containing a collection of tables registered in your Unity Catalog metastore.
- Grant Access to the Recipient: Provide the recipient access to the share.In the bearer token flow, recipients download the credential file via the activation link to establish a secure connection.In the OIDC flow, recipients authenticate directly through their IdP.
Provider-Specific Configurations
- Many organizations have custom Delta Sharing setups (e.g., Amperity, Atlassian, Oracle).
- Follow provider-specific instructions for configuration and security.
Security and Best Practices
- Token Management: Configure recipient tokens to expire; avoid long-lived tokens where possible.
- Credential Security: Encourage recipients to manage downloaded credential files securely.
- Network Security: Assign IP access lists to restrict recipient access to trusted networks.
- Cross-Cloud Support: Open sharing supports cross-cloud environments (e.g., AWS → GovCloud, Azure China).
3. Customer-Managed Open-Source Delta Sharing Server
4. The SAP Business Data Cloud (BDC) Connector for Databricks
Lets you share data between your Unity Catalog-enabled workspace and an SAP BDC account. This approach uses the SAP BDC Connector, which utilizes Delta Sharing for live, zero-copy access to SAP BDC data products.
The SAP BDC Connector for Databricks enables seamless, secure, and live access to SAP Business Data Cloud (BDC) data directly from your Databricks workspace. By leveraging this connector, organizations can break down data silos, reduce the complexity and cost of traditional data extraction, and analyze SAP BDC data alongside other data sources — all within a Unity Catalog-enabled workspace.
The connector integrates with Databricks using Delta Sharing, providing zero-copy access to SAP BDC data products. This ensures that data remains in place while supporting full governance and auditing via Unity Catalog. For secure exchanges, it implements protocols such as mutual TLS (mTLS) and OpenID Connect (OIDC).
How to Share Data Between Databricks and SAP BDC
To enable data sharing, a Databricks workspace admin (with CREATE PROVIDER and CREATE RECIPIENT privileges) and an SAP BDC admin must first establish a secure connection:
- Prepare the Databricks Workspace: Ensure your workspace is Unity Catalog-enabled. Set up Delta Sharing. Create a Connection. The Databricks admin sends their connection identifier to the SAP BDC admin. The SAP BDC admin sets up a Third Party Connection and returns the invitation link. The Databricks admin completes the setup using this invitation link.
- Share Data Using Delta Sharing: Create a catalog from a share. Grant SAP BDC recipients access to Delta Sharing data shares.
Usage Data Sharing with SAP
When using the SAP BDC Connector, Databricks may share usage and operational information with SAP for administrative and billing purposes. This may include:
- Volume of SAP BDC data relative to non-SAP data in workloads
- Date and time of workloads
- Your organization’s effective Databricks consumption costs
All information is shared per-workload on an unaggregated and unanonymized basis.
Delta Sharing Security Best Practices
Sharing data securely is critical, especially when dealing with sensitive information. Below are our recommended best practices for using Delta Sharing effectively and safely:
1. Choose Between Open Source and Managed Versions
Delta Sharing was built with security in mind, but there are key advantages to using the managed version with Unity Catalog:
- Provides fine-grained access control centrally, simplifying permissions management across multiple datasets and recipients.
- Reduces operational overhead: no need to maintain sharing servers or underlying storage access restrictions.
- Built-in audit logging and governance.
- Share management is simplified with SQL syntax and REST APIs.
- Scalability and uptime are handled for you.
Recommendation: Evaluate your requirements. If ease of setup, out-of-the-box governance, auditing, and managed service are important, the managed version is preferable.
2. Set Appropriate Recipient Token Lifetimes
- Tokens grant access to Delta Shares.
- Avoid setting token lifetime to 0 (never expires), as this is a security risk.
- Use short-lived tokens for regulatory, compliance, and risk management purposes.
- It’s easier to issue a new token than to investigate misuse of an unlimited token.
See the AWS and Azure documentation for configuration options: seconds, minutes, hours, or days.
3. Rotate Credentials Regularly
Reasons to rotate credentials:
- Tokens have expired.
- Credentials may have been compromised.
- Changes to token lifetimes require new credentials.
Best Practices:
- Establish a process with clear responsibilities and SLAs.
- Immediately revoke compromised credentials.
- Generate new tokens using --existing-token-expire-in-seconds 0 to invalidate old ones.
- Share new activation links securely and grant access once verified.
4. Implement Granular Access Controls
- Shares can contain multiple tables, associated with multiple recipients.
- Use fine-grained controls and partition-level sharing where possible.
- Follow the principle of least privilege: if a token is compromised, only minimal data is exposed.
5. Configure IP Access Lists
- Delta Sharing tokens alone can grant access; add network-level security.
- Restrict access to trusted IPs (e.g., corporate VPNs).
- Unauthorized access now requires both the token and network access, greatly improving security.
6. Enable Databricks Audit Logging
- Audit logs track all Delta Sharing activity.
- Set up pipelines to monitor events and trigger alerts for suspicious activity.
Key questions to monitor:
- Most popular Delta Shares.
- Geographical access patterns.
- Access attempts outside IP restrictions.
- Failed authentication attempts.
7. Apply Network Restrictions on Storage Accounts
Delta Sharing clients access data via short-lived credentials directly from storage. Ensure storage-level protections match share-level protections:
Azure:
- Use Managed Identities for Unity Catalog.
- Configure storage firewalls and private endpoints for trusted access.
AWS:
- Use S3 bucket policies to restrict access to trusted IPs and VPCs.
- Ensure only Unity Catalog IAM roles (and a small set of administrators) have bucket access.
Example Deny policy (AWS):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyAccessFromUntrustedNetworks",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::<bucket>", "arn:aws:s3:::<bucket>/*"],
"Condition": {
"NotIpAddressIfExists": {
"aws:SourceIp": ["<trusted_ip_1>", "<trusted_ip_2>"]
},
"StringNotEqualsIfExists": {
"aws:SourceVpc": ["<allowed_vpc_id>"]
}
}
}
]
}8. Enable Logging on Storage Accounts
- Monitor attempts to bypass network-level restrictions.
- AWS: Enable S3 server access logging with alerts.
- Azure: Enable Diagnostic logging with monitoring pipelines.
Conclusion
Delta Sharing simplifies secure data exchange within and across organizations:
- Open but secure: Share data safely in real-time, independent of platform.
- Enterprise scale: Reduce cost and complexity of data sharing.
- Value creation: Databricks Marketplace helps reach more consumers, reduce costs, and maximize business value.
By combining Delta Sharing with these best practices, organizations can achieve secure, efficient, and scalable data collaboration across teams, partners, and customers.