Using DBR 13.3, i'm ingesting data from 1 adls storage account using autoloader with file notification mode enabled. and writing to container in another adls storage account. This is an older code which is using foreachbatch sink to process the data before merging with tables in delta lake.
Issue
Notifications are generated for new files and when streaming job runs, it shows that there is epoch_id generated for each batch processed in foreachBatch() but the dataframe for same epoch_id is appearing empty.
The file which is being pointed out in the notification does contain data, so its not the source data is empty.
Moreover, if I switch it to directory listing mode (default), it works.
Following file notification specific options are set:
options['cloudFiles.includeExistingFiles'] = 'false'
options["cloudFiles.subscriptionId"] = cloudfiles_subscriptionid
options["cloudFiles.tenantId"] = cloudfiles_tenantid
options["cloudFiles.clientId"] = cloudfiles_clientid
options["cloudFiles.clientSecret"] = cloudfiles_clientsecret
options["cloudFiles.resourceGroup"] = cloudfiles_resourcegroup
options["cloudFiles.fetchParallelism"] = 5
options["cloudFiles.resourceTag.streaming_job_autoloader_file_notification_enabled"] = 'true'
options["cloudFiles.resourceTag.streaming_job_autoloader_stream_id"] = 'some_id'
options["cloudFiles.queueName"] = "some_pregenerated_queue"