cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta sharing for external table to external users who has no access to external storage?

tonykun_sg
New Contributor II

We used delta sharing (authentication type: token) to generate the config.share file and share with external users not from our organisation, the users faced the "FileNotFoundError" error while using python "delta_sharing.load_as_pandas" method to read the external table (underlying storage in Azure ADLS gen2). Looks like because the users have no access to our private ADLS gen2. We attempted to whitelist ip in storage networking settings, but no use. How to resolve the issue? 

5 REPLIES 5

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @tonykun_sg,

I don't think it has to do with credentials issue at this point, otherwise it would have failed with permission denied.

Looks like this might not be correct: 

The profile URL config.share#tony-sharing-test.bronze.saps4_config_dd031_t_d might be incorrect

Hi Albert

I used a similar way as @Isi  suggested, I put the config.share file in the local directory, and I can list the table successfully. But the error occured while loading as pandas dataframe, indicating access issue to underlying storage. Thanks for the response. 

Isi
New Contributor II

Hey @tonykun_sg ,

I share with you a complete example of delta sharing ๐Ÿ™‚

import delta_sharing
import json

# Point to the profile file. It can be a file on the local file system or a file on a remote storage.
profile_file = "config.share"

# Create a SharingClient.
client = delta_sharing.SharingClient(profile_file)

# Show tables
tables = client.list_all_tables()
for t in tables:
    print(t)

# Read table
table_url = profile_file + "#<share-name>.<schema-name>.<table-name>"
df_table = delta_sharing.load_as_pandas(table_url)
print(df_table.head())

# Read using table predicates example

predicate_hints = {
    "op": "and",
    "children": [
        {
            "op": "equal",
            "children": [
                {"op": "column", "name": "is_outlet", "valueType": "int"},
                {"op": "literal", "value": 0, "valueType": "int"}
            ]
        },
        {
            "op": "equal",
            "children": [
                {"op": "column", "name": "iso_country", "valueType": "string"},
                {"op": "literal", "value": "AD", "valueType": "string"}
            ]
        }
    ]
}

predicate_hints_json = json.dumps(predicate_hints)

df_table_2 = delta_sharing.load_as_pandas(
    table_url,
    jsonPredicateHints=predicate_hints_json
)

print(df_table_2.head())

In my example my config.share is in the same path, locally. Maybe you could save in Azure Key Vault retrieve the value and save in temporary file to avoid this problem.


Hope this helps ๐Ÿ™‚

tonykun_sg
New Contributor II

Hi @Isi ,

Thanks for sharing your scripts,  I can listed the tables created in the share, however, data loading was blocked while using "load_as_pandas" method if I disable my vpn. Once vpn was enabled, I can access the private storage, then the dataset can be read successfully. In my opinion, the external user must have direct access to the storage as well. I'm not sure if my understanding is correct or not. 

Isi
New Contributor II

Hello @tonykun_sg,

It looks like ADLS Gen2 might be restricting access to the data through an ACL, which is why Databricks allows access but the underlying files remain protected. Could you check with your team to temporarily enable access for testing?

Another option to consider is creating a Service Principal with a token that has access to a Databricks SQL Warehouse. The user could authenticate with the Service Principal, send a request to the warehouse, and retrieve the dataโ€”all without needing direct access to the storage.

Additionally, permissions can be refined (Unity Catalog) to allow access only to the selected tables, ensuring that the user gets access strictly to the necessary datasets.

This approach wouldnโ€™t even require a Python script; the user could do everything through DBeaver or a similar SQL client. Let me know if this sounds like a viable workaround for your case!

๐Ÿ™‚

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group