cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Auto Loader with File Notification mode not picking up new files in Delta Live Tables pipeline

rvo19941
New Contributor II

Dear,

I am developing a Delta Live Table pipeline and use Auto Loader with File Notification mode to pick up files inside an Azure storage account (which is not the storage used by the default catalog). When I full refresh the target streaming table, all exisiting files will be processed. However, when I refresh the pipeline later on, new files are not picked up. I am using DLT with Unity Catalog and the default managed catalog. 

Looking at the storage queue, I see the following streamStatus:

Unknown.

Reason: Failed to check the last update time of checkpoint directory abfss://unity-catalog-storage@<managed_storage_account>.dfs.core.windows.net/..., exception:
Failure to initialize configuration for storage account <managed_storage_account>..core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:715)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2100)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:272)
shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:239)
com.databricks.common.filesystem.LokiABFS.initialize(LokiABFS.scala:36)
com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:168)
com.databricks.common.filesystem.FileSystemCache.getOrCompute(FileSystemCache.scala:43)
com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:164)
com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:258)
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3611)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:554)
org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
com.databricks.sql.fileNotification.autoIngest.ResourceManagementUtils$.getStreamStatus(ResourceManagementUtils.scala:62)
com.databricks.sql.aqs.autoIngest.CloudFilesAzureResourceManager.$anonfun$listNotificationServices$1(CloudFilesAzureResourceManager.scala:78)
scala.collection.immutable.List.map(List.scala:293)
com.databricks.sql.aqs.autoIngest.CloudFilesAzureResourceManager.listNotificationServices(CloudFilesAzureResourceManager.scala:74)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
py4j.Gateway.invoke(Gateway.java:306)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
py4j.ClientServerConnection.run(ClientServerConnection.java:119)
java.lang.Thread.run(Thread.java:750)

What I already did:

  • Transferred ownership of the DLT pipeline to the service principal (SP)
  • Granted the SP access to the default catalog's external location (where checkpoint is located)
  • Double check that the has ownership of 
  • Double checked that the SP has Contributor, EventGrid EventSubscription Contributor and Storage Queue Data Contributor roles:

rvo19941_0-1730383733629.png

 

1 REPLY 1

SparkJun
Databricks Employee
Databricks Employee

Based on the error "Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key", the pipeline was still trying to use an account key authentication method instead of service principal authentication. Can we see your autoloader code? 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group