Hey,
I agree it would be ideal to have the data on a storage account that supports queues but unfortunately this is not in my control.
Regarding your option:
Use Event Hubs or Service Bus instead of Storage Queues:
- Reconfigure Event Grid: Change your Event Grid subscription endpoint from Storage Queue to Service Bus Topic/Queue or Event Hub
- Update Autoloader config: Use Service Bus connection string instead of storage account details
- This bypasses the cross-storage account issue since Service Bus isn't tied to a specific storage account
Can you please help me to understand how to configure autoloader to do this. I can't see any such options in the documentation:
https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/option...
And regarding this option:
Custom Event Relay
If you must keep blob data in Premium:
- Use Azure Functions or Azure Logic Apps to read events from the Storage Account 2 queue and re-write or forward them to a queue on Storage Account 1 (or to a compatible service).
- But this is complex and fragile, and generally not recommended unless absolutely required.
Being that Storage account 1 is a premium storage account and doesn't support queues, we made an azure event subscription from the storage account system topics on Storage account 1 and having them output to a queue on storage account 2. I assume you are mainly talking about using a compatible service such as event hub in your other option.
Just for a bit of a rant:
I would like to know why you couldn't have a queue on a separate storage account which is being populated with messages from another storage account work. It appears that the AWS version having the queueurl being completely separate from the storage works exactly as I would expect.
The messages on the queue being the system topic events have all the information to get the correct data:
{"topic":"/subscriptions/{SUB_ID}/resourceGroups/{RG_001}/providers/Microsoft.Storage/storageAccounts/{SA-001}","subject":"/blobServices/default/containers/rawzone/blobs/{PATH_TO_PARQUET}","eventType":"Microsoft.Storage.BlobCreated","id":"{ID}","data":{"api":"CreateFile","clientRequestId":"{CLIENT_REQUEST_ID}","requestId":"{REQUEST_ID}","eTag":"{ETAG}","contentType":"application/octet-stream","contentLength":0,"contentOffset":0,"blobType":"BlockBlob","blobProperties":[{"acl":[{"access":"u::rw,g::r,o::","permission":"0640","owner":"{OWNER_ID}","group":"$superuser"}]}],"blobUrl":"https://{SA-001}.blob.core.windows.net/rawzone/{PATH_TO_PARQUET}","url":"https://{SA-001}.dfs.core.windows.net/rawzone/{PATH_TO_PARQUET}","sequencer":"00000000000000000000000000031001000000000000dd40","identity":"{ID}","storageDiagnostics":{"batchId":"{BATCH_ID}"}},"dataVersion":"3","metadataVersion":"1","eventTime":"2025-07-29T12:47:45.7969224Z"}
The event contains the storage account information in the topic and bloburl fields.
Interestingly they must be using either the subject field to get the relative path to the parquet or the blob urls as no other fields contain the path to the parquet. So seems like they have the storageaccount name where the data actually is located but must not be using it. I guess It would be nice to know how autoloader is working under the hood.
My expectation would be that the autoloader configuration would be fine to have the path be to the location that you want to read data from and the additional configuration to read a queue that contains messages be able to come from anywhere.