Auto Loader with File Notification mode not picking up new files in Delta Live Tables pipeline
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2024 08:52 AM
Dear,
I am developing a Delta Live Table pipeline and use Auto Loader with File Notification mode to pick up files inside an Azure storage account (which is not the storage used by the default catalog). When I full refresh the target streaming table, all exisiting files will be processed. However, when I refresh the pipeline later on, new files are not picked up. I am using DLT with Unity Catalog and the default managed catalog.
Looking at the storage queue, I see the following streamStatus:
Unknown.
Reason: Failed to check the last update time of checkpoint directory abfss://unity-catalog-storage@<managed_storage_account>.dfs.core.windows.net/..., exception:
Failure to initialize configuration for storage account <managed_storage_account>..core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:715)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2100)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:272)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:239)
com.databricks.common.filesystem.LokiABFS.initialize(LokiABFS.scala:36)
com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:168)
com.databricks.common.filesystem.FileSystemCache.getOrCompute(FileSystemCache.scala:43)
com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:164)
com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:258)
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3611)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:554)
org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
com.databricks.sql.fileNotification.autoIngest.ResourceManagementUtils$.getStreamStatus(ResourceManagementUtils.scala:62)
com.databricks.sql.aqs.autoIngest.CloudFilesAzureResourceManager.$anonfun$listNotificationServices$1(CloudFilesAzureResourceManager.scala:78)
scala.collection.immutable.List.map(List.scala:293)
com.databricks.sql.aqs.autoIngest.CloudFilesAzureResourceManager.listNotificationServices(CloudFilesAzureResourceManager.scala:74)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
py4j.Gateway.invoke(Gateway.java:306)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
py4j.ClientServerConnection.run(ClientServerConnection.java:119)
java.lang.Thread.run(Thread.java:750)
What I already did:
- Transferred ownership of the DLT pipeline to the service principal (SP)
- Granted the SP access to the default catalog's external location (where checkpoint is located)
- Double check that the has ownership of
- Double checked that the SP has Contributor, EventGrid EventSubscription Contributor and Storage Queue Data Contributor roles:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2024 10:33 PM
Based on the error "Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key", the pipeline was still trying to use an account key authentication method instead of service principal authentication. Can we see your autoloader code?

