cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Use temporary table credentials to access data in Databricks

matthiasn
New Contributor

Hi everybody,

I tested the temporary table credentials API. I works great, as long as I use the credentials outside of Databricks (e.g. in a local duckdb instance).

But as soon as I try to use the short living credentials (Azure SAS for me) in Databricks, e.g. in a notebook, it doesn't work anymore:

1. duckdb: complains "AzureBlobStorageFileSystem could not open file: unknown error occurred, this could mean the credentials used were wrong."
2. azure-storage-blob python package: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature." 
3. spark, read abfss url directly: "ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url overlaps with managed storage within 'CheckPathAccess' call. ."

The third one made me think. Is it that within Databricks the access to known managed storage locations is blocked for all kinds of libraries, even when accessing with a temporary credential? this would mean temporary credentials could only be used outside of databricks. And therefore it would not be possible read the data in databricks with any other engine than spark?

And if not: has anybody made duckdb run in databricks, directly accessing the data in the metastore?

(I know that I could always get from pyspark to pandas/polars/arrow/duckdb/.., but I would be interested in skipping pyspark, especially when amounts of data a rather small)

2 REPLIES 2

Walter_C
Databricks Employee
Databricks Employee

Within Databricks, access to known managed storage locations might be restricted for all kinds of libraries, even when using temporary credentials. This could explain why you are facing issues with DuckDB, the Azure Storage Blob Python package, and Spark when trying to access data with temporary credentials.

If direct access using DuckDB is not feasible, you might consider using Spark to read the data and then converting it to a format that DuckDB can consume. This approach, although not ideal, can help you work around the current limitations.

As I said, I know that it would work using Spark to read the data. 
That there might be restrictions that could restrict the access is obvious, I was asking if anybody could confirm those or if somebody managed to use temporary credentials to read inside a notebook.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group