Databricks Community

kulkpd · ‎11-17-2023

I am using DLT with filenotification and DLT job is just fetching 1 notification from SQS queue at a time. My pipeline is expected to process 500K notifications per day but it running hours behind. Any recommendations?

spark.readStream.format("cloudFiles")

.option("cloudFiles.schemaLocation","/mnt/abc/")

.option('cloudFiles.format', 'json')

.option('cloudFiles.inferColumnTypes', 'true')

.option('cloudFiles.useNotifications', True)

.option('skipChangeCommits', 'true')

.option('cloudFiles.backfillInterval', '3 hour')

.option('cloudFiles.maxFilesPerTrigger', 10000)

Logs:
NotificationFileEventFetcher: [queryId =] Fetched 1 messages from cloud queue storage.
NotificationFileEventFetcher: [queryId =] Fetched 1 messages from cloud queue storage.
NotificationFileEventFetcher: [queryId =] Fetched 1 messages from cloud queue storage.

kulkpd · ‎11-17-2023

ThankscloudFiles.fetchParallelism to 100 definitely helped to read more messages from SQS.

NotificationFileEventFetcher: [queryId = 111] Fetched 100 messages from cloud queue storage

View solution in original post

Rdipak · ‎11-17-2023

Can you set this value to higher number and try

cloudFiles.fetchParallelism its 1 by default

kulkpd · ‎11-17-2023

ThankscloudFiles.fetchParallelism to 100 definitely helped to read more messages from SQS.

NotificationFileEventFetcher: [queryId = 111] Fetched 100 messages from cloud queue storage

Databricks Community

Autoloader with filenotification

The Next Wave of Enterprise AI | Webinar

🌟 Community Pulse: Your Weekly Roundup! June 29 – July 05, 2026

📌‌ Complete Your Profile – Help Others Get to Know You

Solution Accelerator Series | Identify Fraud With Geospatial Analytics and AI

Databricks Community Champion - June 2026 - Amira Bedhiafi