cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

Use Unity External Location with full paths in delta_log

migq2
New Contributor II

I have an external delta table in unity catalog (let's call it mycatalog.myschema.mytable) that only consists of a `_delta_log` directory that I create semi-manually, with the corresponding JSON files that define it. 

The JSON files point to parquet files that are not in the same directory as the `_delta_log`, but in a different one (can even be a different Azure storage account, I am in Azure Databricks)

As an example, the JSON could look like this: 

 

 

{
    "add": {
        "dataChange": true,
        "modificationTime": 1710850923000,
        "partitionValues": {},
        "path": "abfss://mycontainer@mystorageaccount.dfs.core.windows.net/somepath/somefile.snappy.parquet",
        "size": 12345,
        "stats": "{\"numRecords\":123}",
        "tags": {
            "INSERTION_TIME": "1710850923000000",
            "MAX_INSERTION_TIME": "1710850923000000",
            "MIN_INSERTION_TIME": "1710850923000000",
            "OPTIMIZE_TARGET_SIZE": "268435456"
        }
    }
}

 

 


When I try to read my delta table using spark.sql("SELECT * FROM mycatalog.myschema.mytable")` I get the following error:

RuntimeException: Couldn't initialize file system for path abfss://mycontainer@mystorageaccount.dfs.core.windows.net/somepath/somefile.snappy.parquet

which means Databricks is not trying to access that file using Unity external locations but the storage account key. 

The path is declared in a external location and I can access it normally with UC credentials using 
spark.read.load("abfss://mycontainer@mystorageaccount.dfs.core.windows.net/somepath/", format="delta")

 
Is there a way to use UC external locations with a delta table that uses absolute paths in the _delta_log? Due to security I don't want to add the storage account key to the cluster using spark.conf "fs.azure.account.key.mystorageaccount.dfs.core.windows.net

 

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @migq2 , 

  • Once a table is created in a specific path, users cannot directly access the files in that path—even if they have privileges on an external location or storage credential.
  • This restriction ensures that users cannot bypass access controls applied to tables by reading files directly from the cloud tenant.
  • Consider a scenario where User U4 has access to the external location (storage account) but does not have access to the table T1. In such cases, the restriction applies, and an error like “PERMISSION_DENIED: trying to access path with conflicting external tables” is raised.
  • Unfortunately, there isn’t a direct way to use Unity Catalog external locations with a Delta table that relies on absolute paths within the _delta_log.
  • If possible, consider using relative paths within the _delta_log instead of absolute paths. This way, the external table can be queried without encountering the file system initialization issue.
  • Continue using spark.read.load with UC credentials to access the data. While this doesn’t directly involve the external table, it provides a workaround.

migq2
New Contributor II

Thanks for your reply Kaniz,

I understand your points, but I cannot use relative paths in my _delta_log because the files I need for my delta table are not all in the same path (they might not even be on the same storage account). 

Copying them is not an option either because I am doing this at scale for many tables and many files

-werners-
Esteemed Contributor III

Besides what already has been mentioned, it is best to let the delta writer handle the location of _delta_log and the parquet files,  they belong to each other.

migq2
New Contributor II

Thank you, however in my specific case the parquet files are not written by Spark or Databricks, but by another external tool.

Also, some files are shared by multiple tables, or a table can have files in different storage accounts. 

This makes having them in the same location as a normal spark writer would create them not feasible  

-werners-
Esteemed Contributor III

I suggest you look at something else than UC for such cases.  I also wonder if delta lake is the right format.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.