cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream

rvo19941
New Contributor II

Dear,

I am working on a real-time use case and am therefore using Auto Loader with file notification to ingest json files from a Gen2 Azure Storage Account in real-time. Full refreshes of my table work fine but I noticed Auto Loader was not picking up new files landing in the storage account. I have checked the Queue Storage and it stays empty. However, when I manually add a file, a message is added to the queue and the file is processed as expected. 

After some digging I found out the external system writing the files to the storage account was written these files as a stream (when I inspect the properties of the files written by the external system, I see "application/octet-stream" as CONTENT-TYPE whereas when I manually add a file I see "application/json"). This event type is not matched by default by the event subscription created by Databricks.

I tried to add it to the advanced filters of the event subscription (with key pair data.api: CreateFile). This generates messages in the queue but because the Microsoft.Storage.BlobCreated event is triggered when the CopyBlob operation is initiated and no... and the Create File API call  first initiates files and then content is added to the file, the contentLength parameter of the corresponding message in the queue is set to 0 and Auto Loader considers the file to be empty, even though it's not. 

Is there a solution/work-around or is this a limitation of file notification? Thanks in advance!

1 REPLY 1

Panda
Valued Contributor

@rvo19941 -  Can you share your autoloder config.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group