I have an external delta table in unity catalog (let's call it mycatalog.myschema.mytable) that only consists of a `_delta_log` directory that I create semi-manually, with the corresponding JSON files that define it.
The JSON files point to parquet files that are not in the same directory as the `_delta_log`, but in a different one (can even be a different Azure storage account, I am in Azure Databricks)
As an example, the JSON could look like this:
{
"add": {
"dataChange": true,
"modificationTime": 1710850923000,
"partitionValues": {},
"path": "abfss://mycontainer@mystorageaccount.dfs.core.windows.net/somepath/somefile.snappy.parquet",
"size": 12345,
"stats": "{\"numRecords\":123}",
"tags": {
"INSERTION_TIME": "1710850923000000",
"MAX_INSERTION_TIME": "1710850923000000",
"MIN_INSERTION_TIME": "1710850923000000",
"OPTIMIZE_TARGET_SIZE": "268435456"
}
}
}
When I try to read my delta table using spark.sql("SELECT * FROM mycatalog.myschema.mytable")` I get the following error:
RuntimeException: Couldn't initialize file system for path abfss://mycontainer@mystorageaccount.dfs.core.windows.net/somepath/somefile.snappy.parquet
which means Databricks is not trying to access that file using Unity external locations but the storage account key.
The path is declared in a external location and I can access it normally with UC credentials using
spark.read.load("abfss://mycontainer@mystorageaccount.dfs.core.windows.net/somepath/", format="delta")
Is there a way to use UC external locations with a delta table that uses absolute paths in the _delta_log? Due to security I don't want to add the storage account key to the cluster using spark.conf "fs.azure.account.key.mystorageaccount.dfs.core.windows.net