cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Open sharing protocol in Datbricks notebook

dbx_deltaSharin
New Contributor II

Hello,

I utilize an Azure Databricks notebook to access Delta Sharing tables, employing the open sharing protocol. I've successfully uploaded the 'config.share' file to dbfs. Upon executing the commands:

 

 

client = delta_sharing.SharingClient(f"/dbfs/path/config.share")
client.list_all_tables()

 

 

  I can observe all table names and schemas. However, when I attempt to display the data using 

 

 

spark.read.format("deltaSharing")

 

 

I encounter an error labeled 'error content'.

 

 

FileReadException: Error while reading file delta-sharing:/dbfsXXXX.
Caused by: IOException: java.util.concurrent.ExecutionException: io.delta.sharing.spark.util.UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 403 This request is not authorized to perform this operation. {"error":{"code":"AuthorizationFailure","message":"This request is not authorized to perform this operation.\nRequestId:4b5091fb-e01f-004e-1391-fa30ed000000\nTime:2023-10-09T09:16:31.5069373Z"}}
Caused by: ExecutionException: io.delta.sharing.spark.util.UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 403 This request is not authorized to perform this operation. {"error":{"code":"AuthorizationFailure","message":"This request is not authorized to perform this operation.\nRequestId:4b5091fb-e01f-004e-1391-fa30ed000000\nTime:2023-10-09T09:16:31.5069373Z"}}

 

For Details I use Databricks standard version and Runtime 13.1 ML.

Has anyone else experienced the same error?

1 ACCEPTED SOLUTION

Accepted Solutions

Manisha_Jena
Databricks Employee
Databricks Employee

Hi @dbx_deltaSharin,

When querying the individual partitions, the files are being read by using an S3 access point location while it is using the actual S3 name when reading the table as a whole. This information is fetched from the table metadata itself.

It appears, in the source metastore, the table metadata is pointed to the s3 location where as the partitions are defined with the s3 access point location.

Please review the table and partition metadata at the source table. Update the table metadata and point the table location to the s3 access point similar to what's defined for the partitions.

Also, please review the IAM role which was used, Is it defined to allow access with both the s3 name as well as the s3 access point name? Can we add if one is missing? If this is not in the IAM role, the restriction to the actual S3 bucket from outside may be on a higher level (eg: AWS SCP policies).

View solution in original post

2 REPLIES 2

Hi,

Thank you @Retired_mod for responding to my question. For additional information, the 'config.share' file follows this format: 

{"shareCredentialsVersion":1,"bearerToken":"valuexxxx","endpoint":"endpointUrl","expirationTime":"expirationTimeValue"}

The data shared from another Databricks account. Therefore, I'm wondering how it could be an authorization or permission issue, especially since I can already observe all table names and schemas using the same 'config.share' file.

Manisha_Jena
Databricks Employee
Databricks Employee

Hi @dbx_deltaSharin,

When querying the individual partitions, the files are being read by using an S3 access point location while it is using the actual S3 name when reading the table as a whole. This information is fetched from the table metadata itself.

It appears, in the source metastore, the table metadata is pointed to the s3 location where as the partitions are defined with the s3 access point location.

Please review the table and partition metadata at the source table. Update the table metadata and point the table location to the s3 access point similar to what's defined for the partitions.

Also, please review the IAM role which was used, Is it defined to allow access with both the s3 name as well as the s3 access point name? Can we add if one is missing? If this is not in the IAM role, the restriction to the actual S3 bucket from outside may be on a higher level (eg: AWS SCP policies).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group