cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta sharing open protocol in Unity catalog: FileNotFoundError

kiko_roy
Contributor

Hi Team

I have created a recipient under delta sharing (azure databricks) . Unity catalog is enabled and data is stored in ADLS gen2. I have downloaded the credential file and trying to resue in my python script (as per databricks documentation) for a POC activity from my local machine (jupyter notebook) . delta_sharing package is successfully installed in my local system 

import delta_sharing
import pandas

client = delta_sharing.SharingClient(f"C:/Users/XXXXXX/Downloads/config.share")

client.list_all_tables() 

delta_sharing.load_as_pandas(f"C:/Users/XXXXX/Downloads/config.share#<sharename>.<schemaname>.<tablename>")

I am able to see the successful listing of the tables, but when trying to load the a particular table data as a pandas dataframe , getting error :

FileNotFoundError: https://......................._unitystorage/schemas/............../tables/.............../part-0000...

 can someone suggest why is it erroring and how can I resolve?

1 ACCEPTED SOLUTION

Accepted Solutions

I just reproduced it for you. It is 100% the networking on the Azure storage account. 

jacovangelder_0-1718124274401.png

 

View solution in original post

5 REPLIES 5

jacovangelder
Contributor III

I wasn't able to reproduce your issue. Is your delta table operable? can you see sample data from within databricks and query the table from within databricks? It almost looks like some parquet files are missing, causing your delta not queryable anymore. 

 

yes My delta table is operable ,it has sample data and can be queried from sql-warehouse as well as notebooks. If I run the script from within databricks env (i.e same metastore) ,Data can be read with delta sharing. But If I run it outside from a jupyter notebook , unable to read . 

Then I am 100% sure that it is because your azure data lake storage does not have public network access enabled, or has a firewall or private endpoint setup. In order to query delta shares, you need to be able to access the storage account where the delta tables reside in. 

I just reproduced it for you. It is 100% the networking on the Azure storage account. 

jacovangelder_0-1718124274401.png

 

Thanks @jacovangelder for checking this out. Even I was supposing so. I tested by running the code in a workspace under same metastore , and could read the data, and when ran the same code from a workspace of a different metastore had a same issue. So this test reconfirms. Surely my system is under private VNET without any endpoints created (as checked with my networking team later)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group