2 weeks ago
We used delta sharing (authentication type: token) to generate the config.share file and share with external users not from our organisation, the users faced the "FileNotFoundError" error while using python "delta_sharing.load_as_pandas" method to read the external table (underlying storage in Azure ADLS gen2). Looks like because the users have no access to our private ADLS gen2. We attempted to whitelist ip in storage networking settings, but no use. How to resolve the issue?
2 weeks ago
Hi @tonykun_sg,
I don't think it has to do with credentials issue at this point, otherwise it would have failed with permission denied.
Looks like this might not be correct:
The profile URL config.share#tony-sharing-test.bronze.saps4_config_dd031_t_d might be incorrect
2 weeks ago
Hi Albert
I used a similar way as @Isi suggested, I put the config.share file in the local directory, and I can list the table successfully. But the error occured while loading as pandas dataframe, indicating access issue to underlying storage. Thanks for the response.
2 weeks ago
Hey @tonykun_sg ,
I share with you a complete example of delta sharing ๐
import delta_sharing
import json
# Point to the profile file. It can be a file on the local file system or a file on a remote storage.
profile_file = "config.share"
# Create a SharingClient.
client = delta_sharing.SharingClient(profile_file)
# Show tables
tables = client.list_all_tables()
for t in tables:
print(t)
# Read table
table_url = profile_file + "#<share-name>.<schema-name>.<table-name>"
df_table = delta_sharing.load_as_pandas(table_url)
print(df_table.head())
# Read using table predicates example
predicate_hints = {
"op": "and",
"children": [
{
"op": "equal",
"children": [
{"op": "column", "name": "is_outlet", "valueType": "int"},
{"op": "literal", "value": 0, "valueType": "int"}
]
},
{
"op": "equal",
"children": [
{"op": "column", "name": "iso_country", "valueType": "string"},
{"op": "literal", "value": "AD", "valueType": "string"}
]
}
]
}
predicate_hints_json = json.dumps(predicate_hints)
df_table_2 = delta_sharing.load_as_pandas(
table_url,
jsonPredicateHints=predicate_hints_json
)
print(df_table_2.head())
In my example my config.share is in the same path, locally. Maybe you could save in Azure Key Vault retrieve the value and save in temporary file to avoid this problem.
Hope this helps ๐
2 weeks ago
Hi @Isi ,
Thanks for sharing your scripts, I can listed the tables created in the share, however, data loading was blocked while using "load_as_pandas" method if I disable my vpn. Once vpn was enabled, I can access the private storage, then the dataset can be read successfully. In my opinion, the external user must have direct access to the storage as well. I'm not sure if my understanding is correct or not.
2 weeks ago
Hello @tonykun_sg,
It looks like ADLS Gen2 might be restricting access to the data through an ACL, which is why Databricks allows access but the underlying files remain protected. Could you check with your team to temporarily enable access for testing?
Another option to consider is creating a Service Principal with a token that has access to a Databricks SQL Warehouse. The user could authenticate with the Service Principal, send a request to the warehouse, and retrieve the dataโall without needing direct access to the storage.
Additionally, permissions can be refined (Unity Catalog) to allow access only to the selected tables, ensuring that the user gets access strictly to the necessary datasets.
This approach wouldnโt even require a Python script; the user could do everything through DBeaver or a similar SQL client. Let me know if this sounds like a viable workaround for your case!
๐
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group