I am using DLT with filenotification and DLT job is just fetching 1 notification from SQS queue at a time. My pipeline is expected to process 500K notifications per day but it running hours behind. Any recommendations?
spark.readStream.format("cloudFiles")
.option("cloudFiles.schemaLocation","/mnt/abc/")
.option('cloudFiles.format', 'json')
.option('cloudFiles.inferColumnTypes', 'true')
.option('cloudFiles.useNotifications', True)
.option('skipChangeCommits', 'true')
.option('cloudFiles.backfillInterval', '3 hour')
.option('cloudFiles.maxFilesPerTrigger', 10000)
Logs:
NotificationFileEventFetcher: [queryId =] Fetched 1 messages from cloud queue storage.
NotificationFileEventFetcher: [queryId =] Fetched 1 messages from cloud queue storage.
NotificationFileEventFetcher: [queryId =] Fetched 1 messages from cloud queue storage.