Hi @FAHADURREHMAN,
This is expected behavior with Auto Loader. By default, when you point it at a directory path like s3://bucket/folder/, it will recursively traverse all subdirectories and pick up matching files. The pathGlobFilter option only filters by file name pattern, it does not prevent Auto Loader from descending into subfolders.
You have two options to restrict reading to only the top-level folder:
OPTION 1: SET recursiveFileLookup TO FALSE
Add this option to your configuration dictionary:
"recursiveFileLookup": "false"
So your options dict would include:
options = {
"cloudFiles.format": "csv",
"cloudFiles.schemaLocation": SCHEMA_LOCATION,
"cloudFiles.inferColumnTypes": "true",
"cloudFiles.schemaEvolutionMode": "addNewColumns",
"cloudFiles.includeExistingFiles": "true",
"cloudFiles.useNotifications": "false",
"recursiveFileLookup": "false",
"pathGlobFilter": "*.csv",
"header": "true",
"delimiter": ",",
"quote": "\"",
"multiLine": "false",
"badRecordsPath": f"{SCHEMA_LOCATION}/bad_records",
"columnNameOfCorruptRecord": "_corrupt_record",
"cloudFiles.rescuedDataColumn": "_rescued_data",
}
When recursiveFileLookup is set to false, Auto Loader will only discover files in the immediate directory you specify, ignoring any subdirectories.
OPTION 2: USE A WILDCARD PATH INSTEAD OF A DIRECTORY
Instead of pointing to the folder:
src_path = "s3://your-bucket/your-folder/"
Use a wildcard that matches only top-level CSV files:
src_path = "s3://your-bucket/your-folder/*.csv"
This tells Auto Loader to only pick up files matching *.csv directly under that path, without descending into subfolders.
Either approach will work. Option 1 is generally the cleaner solution when using Lakeflow Spark Declarative Pipelines (SDP), since it keeps path handling simple and the behavior is controlled explicitly through configuration. Note that what was previously called DLT is now named Lakeflow Spark Declarative Pipelines (SDP).
For reference, the full list of Auto Loader options is documented here:
https://docs.databricks.com/aws/ingestion/cloud-object-storage/auto-loader/options
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.