Dear,
I am developing a Delta Live Table pipeline and use Auto Loader with File Notification mode to pick up files inside an Azure storage account (which is not the storage used by the default catalog). When I full refresh the target streaming table, all exisiting files will be processed. However, when I refresh the pipeline later on, new files are not picked up. I am using DLT with Unity Catalog and the default managed catalog.
Looking at the storage queue, I see the following streamStatus:
Unknown.
Reason: Failed to check the last update time of checkpoint directory abfss://unity-catalog-storage@<managed_storage_account>.dfs.core.windows.net/..., exception:
Failure to initialize configuration for storage account <managed_storage_account>..core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:715)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2100)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:272)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:239)
com.databricks.common.filesystem.LokiABFS.initialize(LokiABFS.scala:36)
com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:168)
com.databricks.common.filesystem.FileSystemCache.getOrCompute(FileSystemCache.scala:43)
com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:164)
com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:258)
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3611)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:554)
org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
com.databricks.sql.fileNotification.autoIngest.ResourceManagementUtils$.getStreamStatus(ResourceManagementUtils.scala:62)
com.databricks.sql.aqs.autoIngest.CloudFilesAzureResourceManager.$anonfun$listNotificationServices$1(CloudFilesAzureResourceManager.scala:78)
scala.collection.immutable.List.map(List.scala:293)
com.databricks.sql.aqs.autoIngest.CloudFilesAzureResourceManager.listNotificationServices(CloudFilesAzureResourceManager.scala:74)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
py4j.Gateway.invoke(Gateway.java:306)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
py4j.ClientServerConnection.run(ClientServerConnection.java:119)
java.lang.Thread.run(Thread.java:750)
What I already did:
- Transferred ownership of the DLT pipeline to the service principal (SP)
- Granted the SP access to the default catalog's external location (where checkpoint is located)
- Double check that the has ownership of
- Double checked that the SP has Contributor, EventGrid EventSubscription Contributor and Storage Queue Data Contributor roles: