Databricks Community

matthiasn · ‎01-08-2025

Hi everybody,

I tested the temporary table credentials API. I works great, as long as I use the credentials outside of Databricks (e.g. in a local duckdb instance).

But as soon as I try to use the short living credentials (Azure SAS for me) in Databricks, e.g. in a notebook, it doesn't work anymore:

1. duckdb: complains "AzureBlobStorageFileSystem could not open file: unknown error occurred, this could mean the credentials used were wrong."
2. azure-storage-blob python package: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature."
3. spark, read abfss url directly: "ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url overlaps with managed storage within 'CheckPathAccess' call. ."

The third one made me think. Is it that within Databricks the access to known managed storage locations is blocked for all kinds of libraries, even when accessing with a temporary credential? this would mean temporary credentials could only be used outside of databricks. And therefore it would not be possible read the data in databricks with any other engine than spark?

And if not: has anybody made duckdb run in databricks, directly accessing the data in the metastore?

(I know that I could always get from pyspark to pandas/polars/arrow/duckdb/.., but I would be interested in skipping pyspark, especially when amounts of data a rather small)

matthiasn · ‎01-20-2025

As a update:
Querying with duckdb using tokens in databricks is actually possible.

What was needed:

SET azure_transport_option_type = 'curl';
in duckdb.

Afterwards querying duckdb worked seamlessly.

I've written about it and also added an example: https://www.codecentric.de/wissens-hub/blog/access-databricks-unitycatalog-from-duckdb

View solution in original post

Walter_C · ‎01-08-2025

Within Databricks, access to known managed storage locations might be restricted for all kinds of libraries, even when using temporary credentials. This could explain why you are facing issues with DuckDB, the Azure Storage Blob Python package, and Spark when trying to access data with temporary credentials.

If direct access using DuckDB is not feasible, you might consider using Spark to read the data and then converting it to a format that DuckDB can consume. This approach, although not ideal, can help you work around the current limitations.

matthiasn · ‎01-09-2025

As I said, I know that it would work using Spark to read the data.
That there might be restrictions that could restrict the access is obvious, I was asking if anybody could confirm those or if somebody managed to use temporary credentials to read inside a notebook.

Walter_C · ‎01-10-2025

Hello, sure I understand, we can confirm, and at least on my research I was not able to find a way to workaround this behavior.

matthiasn · ‎01-13-2025

Hi Walter,

thanks for coming back to this. And thank you for the confirmation, although it is not what i hoped for. It would be really nice to enable the usage of temporary table credentials within databricks to access data without spark. For me databricks isn't so much the managed pyspark runtime anymore, but more a end-to-end data platform which should support different engines in their notebooks, workflows and apps..