Databricks Community

dbx_deltaSharin · ‎10-10-2023

Hello,

I utilize an Azure Databricks notebook to access Delta Sharing tables, employing the open sharing protocol. I've successfully uploaded the 'config.share' file to dbfs. Upon executing the commands:

client = delta_sharing.SharingClient(f"/dbfs/path/config.share")
client.list_all_tables()

I can observe all table names and schemas. However, when I attempt to display the data using

spark.read.format("deltaSharing")

I encounter an error labeled 'error content'.

FileReadException: Error while reading file delta-sharing:/dbfsXXXX.
Caused by: IOException: java.util.concurrent.ExecutionException: io.delta.sharing.spark.util.UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 403 This request is not authorized to perform this operation. {"error":{"code":"AuthorizationFailure","message":"This request is not authorized to perform this operation.\nRequestId:4b5091fb-e01f-004e-1391-fa30ed000000\nTime:2023-10-09T09:16:31.5069373Z"}}
Caused by: ExecutionException: io.delta.sharing.spark.util.UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 403 This request is not authorized to perform this operation. {"error":{"code":"AuthorizationFailure","message":"This request is not authorized to perform this operation.\nRequestId:4b5091fb-e01f-004e-1391-fa30ed000000\nTime:2023-10-09T09:16:31.5069373Z"}}

For Details I use Databricks standard version and Runtime 13.1 ML.

Has anyone else experienced the same error?

Manisha_Jena · ‎11-10-2023

Hi @dbx_deltaSharin,

When querying the individual partitions, the files are being read by using an S3 access point location while it is using the actual S3 name when reading the table as a whole. This information is fetched from the table metadata itself.

It appears, in the source metastore, the table metadata is pointed to the s3 location where as the partitions are defined with the s3 access point location.

Please review the table and partition metadata at the source table. Update the table metadata and point the table location to the s3 access point similar to what's defined for the partitions.

Also, please review the IAM role which was used, Is it defined to allow access with both the s3 name as well as the s3 access point name? Can we add if one is missing? If this is not in the IAM role, the restriction to the actual S3 bucket from outside may be on a higher level (eg: AWS SCP policies).

View solution in original post

dbx_deltaSharin · ‎10-10-2023

Hi,

Thank you @Retired_mod for responding to my question. For additional information, the 'config.share' file follows this format:

{"shareCredentialsVersion":1,"bearerToken":"valuexxxx","endpoint":"endpointUrl","expirationTime":"expirationTimeValue"}

The data shared from another Databricks account. Therefore, I'm wondering how it could be an authorization or permission issue, especially since I can already observe all table names and schemas using the same 'config.share' file.

Manisha_Jena · ‎11-10-2023

Hi @dbx_deltaSharin,

When querying the individual partitions, the files are being read by using an S3 access point location while it is using the actual S3 name when reading the table as a whole. This information is fetched from the table metadata itself.

It appears, in the source metastore, the table metadata is pointed to the s3 location where as the partitions are defined with the s3 access point location.

Please review the table and partition metadata at the source table. Update the table metadata and point the table location to the s3 access point similar to what's defined for the partitions.

Also, please review the IAM role which was used, Is it defined to allow access with both the s3 name as well as the s3 access point name? Can we add if one is missing? If this is not in the IAM role, the restriction to the actual S3 bucket from outside may be on a higher level (eg: AWS SCP policies).