โ01-08-2025 07:20 AM
Hi everybody,
I tested the temporary table credentials API. I works great, as long as I use the credentials outside of Databricks (e.g. in a local duckdb instance).
But as soon as I try to use the short living credentials (Azure SAS for me) in Databricks, e.g. in a notebook, it doesn't work anymore:
1. duckdb: complains "AzureBlobStorageFileSystem could not open file: unknown error occurred, this could mean the credentials used were wrong."
2. azure-storage-blob python package: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature."
3. spark, read abfss url directly: "ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url overlaps with managed storage within 'CheckPathAccess' call. ."
The third one made me think. Is it that within Databricks the access to known managed storage locations is blocked for all kinds of libraries, even when accessing with a temporary credential? this would mean temporary credentials could only be used outside of databricks. And therefore it would not be possible read the data in databricks with any other engine than spark?
And if not: has anybody made duckdb run in databricks, directly accessing the data in the metastore?
(I know that I could always get from pyspark to pandas/polars/arrow/duckdb/.., but I would be interested in skipping pyspark, especially when amounts of data a rather small)
3 weeks ago
As a update:
Querying with duckdb using tokens in databricks is actually possible.
What was needed:
SET azure_transport_option_type = 'curl';
in duckdb.
Afterwards querying duckdb worked seamlessly.
I've written about it and also added an example: https://www.codecentric.de/wissens-hub/blog/access-databricks-unitycatalog-from-duckdb
โ01-08-2025 07:39 AM
Within Databricks, access to known managed storage locations might be restricted for all kinds of libraries, even when using temporary credentials. This could explain why you are facing issues with DuckDB, the Azure Storage Blob Python package, and Spark when trying to access data with temporary credentials.
If direct access using DuckDB is not feasible, you might consider using Spark to read the data and then converting it to a format that DuckDB can consume. This approach, although not ideal, can help you work around the current limitations.
โ01-09-2025 01:03 AM
As I said, I know that it would work using Spark to read the data.
That there might be restrictions that could restrict the access is obvious, I was asking if anybody could confirm those or if somebody managed to use temporary credentials to read inside a notebook.
โ01-10-2025 10:49 AM
Hello, sure I understand, we can confirm, and at least on my research I was not able to find a way to workaround this behavior.
a month ago
Hi Walter,
thanks for coming back to this. And thank you for the confirmation, although it is not what i hoped for. It would be really nice to enable the usage of temporary table credentials within databricks to access data without spark. For me databricks isn't so much the managed pyspark runtime anymore, but more a end-to-end data platform which should support different engines in their notebooks, workflows and apps..
3 weeks ago
As a update:
Querying with duckdb using tokens in databricks is actually possible.
What was needed:
SET azure_transport_option_type = 'curl';
in duckdb.
Afterwards querying duckdb worked seamlessly.
I've written about it and also added an example: https://www.codecentric.de/wissens-hub/blog/access-databricks-unitycatalog-from-duckdb
3 weeks ago
Hello Matthias, many thanks for sharing this valuable information, it is great to hear your issue got resolved.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group