06-11-2024 04:08 AM
Hi Team
I have created a recipient under delta sharing (azure databricks) . Unity catalog is enabled and data is stored in ADLS gen2. I have downloaded the credential file and trying to resue in my python script (as per databricks documentation) for a POC activity from my local machine (jupyter notebook) . delta_sharing package is successfully installed in my local system
import delta_sharing
import pandas
client = delta_sharing.SharingClient(f"C:/Users/XXXXXX/Downloads/config.share")
client.list_all_tables()
delta_sharing.load_as_pandas(f"C:/Users/XXXXX/Downloads/config.share#<sharename>.<schemaname>.<tablename>")
I am able to see the successful listing of the tables, but when trying to load the a particular table data as a pandas dataframe , getting error :
FileNotFoundError: https://......................._unitystorage/schemas/............../tables/.............../part-0000...
can someone suggest why is it erroring and how can I resolve?
06-11-2024 09:44 AM
I just reproduced it for you. It is 100% the networking on the Azure storage account.
06-11-2024 07:41 AM
I wasn't able to reproduce your issue. Is your delta table operable? can you see sample data from within databricks and query the table from within databricks? It almost looks like some parquet files are missing, causing your delta not queryable anymore.
06-11-2024 09:00 AM
yes My delta table is operable ,it has sample data and can be queried from sql-warehouse as well as notebooks. If I run the script from within databricks env (i.e same metastore) ,Data can be read with delta sharing. But If I run it outside from a jupyter notebook , unable to read .
06-11-2024 09:23 AM
Then I am 100% sure that it is because your azure data lake storage does not have public network access enabled, or has a firewall or private endpoint setup. In order to query delta shares, you need to be able to access the storage account where the delta tables reside in.
06-11-2024 09:44 AM
I just reproduced it for you. It is 100% the networking on the Azure storage account.
06-12-2024 04:11 AM
Thanks @jacovangelder for checking this out. Even I was supposing so. I tested by running the code in a workspace under same metastore , and could read the data, and when ran the same code from a workspace of a different metastore had a same issue. So this test reconfirms. Surely my system is under private VNET without any endpoints created (as checked with my networking team later)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group