<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT Auto Loader Reading from Parent S3 Folder not Sub Folders in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-auto-loader-reading-from-parent-s3-folder-not-sub-folders/m-p/149317#M53072</link>
    <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/632"&gt;@Saritha_S&lt;/a&gt;&amp;nbsp;for your prompt feedback and support. Suggested option worked for me.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 25 Feb 2026 23:14:03 GMT</pubDate>
    <dc:creator>FAHADURREHMAN</dc:creator>
    <dc:date>2026-02-25T23:14:03Z</dc:date>
    <item>
      <title>DLT Auto Loader Reading from Parent S3 Folder not Sub Folders</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-auto-loader-reading-from-parent-s3-folder-not-sub-folders/m-p/149113#M53021</link>
      <description>&lt;P&gt;Hi All, I am trying to read csv files from one Folder of S3 bucket. For this particular used case, I do not intent to read from sub folders. I am using below code however its reading all CSVs in sub folders as well. How can i avoid that?&amp;nbsp;&lt;BR /&gt;I used many different versions of below code with help of Chatgpt but none of them seems working. Any help?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;def source_config():&lt;BR /&gt;src_path = BASE_S3_URI.rstrip("/")&lt;/P&gt;&lt;P&gt;options = {&lt;BR /&gt;"cloudFiles.format": "csv",&lt;BR /&gt;"cloudFiles.schemaLocation": SCHEMA_LOCATION,&lt;BR /&gt;"cloudFiles.inferColumnTypes": "true",&lt;BR /&gt;"cloudFiles.schemaEvolutionMode": "addNewColumns",&lt;BR /&gt;"cloudFiles.includeExistingFiles": "true",&lt;BR /&gt;"cloudFiles.useNotifications": "false",&lt;BR /&gt;"pathGlobFilter": "*.csv",&lt;BR /&gt;"header": "true",&lt;BR /&gt;"delimiter": ",",&lt;BR /&gt;"quote": "\"",&lt;BR /&gt;"multiLine": "false",&lt;BR /&gt;# optional (can keep during debugging)&lt;BR /&gt;"badRecordsPath": f"{SCHEMA_LOCATION}/bad_records",&lt;BR /&gt;"columnNameOfCorruptRecord": "_corrupt_record",&lt;/P&gt;&lt;P&gt;"cloudFiles.rescuedDataColumn": "_rescued_data",&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;return src_path, options&lt;/P&gt;</description>
      <pubDate>Tue, 24 Feb 2026 01:39:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-auto-loader-reading-from-parent-s3-folder-not-sub-folders/m-p/149113#M53021</guid>
      <dc:creator>FAHADURREHMAN</dc:creator>
      <dc:date>2026-02-24T01:39:23Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Auto Loader Reading from Parent S3 Folder not Sub Folders</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-auto-loader-reading-from-parent-s3-folder-not-sub-folders/m-p/149188#M53035</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/215264"&gt;@FAHADURREHMAN&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Please find below my findings&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-end="99" data-start="0"&gt;Since you're using &lt;STRONG data-end="49" data-start="19"&gt;Auto Loader (&lt;CODE data-end="46" data-start="34"&gt;cloudFiles&lt;/CODE&gt;)&lt;/STRONG&gt;&amp;nbsp; this behavior is expected.&lt;/P&gt;
&lt;P data-end="142" data-start="101"&gt;By default, when you provide a path like:&lt;/P&gt;
&lt;DIV class="w-full my-4"&gt;
&lt;DIV class=""&gt;
&lt;DIV class="relative"&gt;
&lt;DIV class="h-full min-h-0 min-w-0"&gt;
&lt;DIV class="h-full min-h-0 min-w-0"&gt;
&lt;DIV class="border corner-superellipse/1.1 border-token-border-light bg-token-bg-elevated-secondary rounded-3xl"&gt;
&lt;DIV class="corner-superellipse/1.1 rounded-3xl bg-token-bg-elevated-secondary"&gt;
&lt;DIV class="relative z-0 flex max-w-full"&gt;
&lt;DIV id="code-block-viewer" class="q9tKkq_viewer cm-editor z-10 light:cm-light dark:cm-light flex h-full w-full flex-col items-stretch ͼ5 ͼj" dir="ltr"&gt;
&lt;DIV class="cm-scroller"&gt;
&lt;DIV class="cm-content q9tKkq_readonly"&gt;&lt;SPAN&gt;s3://bucket/folder/&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P data-end="320" data-start="173"&gt;Spark &lt;STRONG data-end="215" data-start="179"&gt;recursively reads all subfolders&lt;/STRONG&gt;.&lt;BR data-end="219" data-start="216" /&gt;&lt;CODE data-end="243" data-start="219"&gt;pathGlobFilter="*.csv"&lt;/CODE&gt; only filters file names — it does NOT prevent recursive directory traversal.&lt;/P&gt;
&lt;P data-end="320" data-start="173"&gt;To overcome the issue, please use the below&amp;nbsp;&lt;/P&gt;
&lt;P data-end="320" data-start="173"&gt;Use &lt;CODE data-end="556" data-start="527"&gt;recursiveFileLookup = false&lt;/CODE&gt;&lt;CODE data-end="556" data-start="527"&gt;&lt;/CODE&gt;&lt;/P&gt;
&lt;PRE data-processed="true"&gt;&lt;CODE data-processed="true"&gt;&lt;SPAN class="undefined" data-processed="true"&gt;.option(&lt;/SPAN&gt;&lt;SPAN class="CS0cqb" data-processed="true"&gt;"&lt;/SPAN&gt;&lt;SPAN class="CS0cqb" data-processed="true"&gt;recursiveFileLookup&lt;/SPAN&gt;&lt;SPAN class="CS0cqb" data-processed="true"&gt;"&lt;/SPAN&gt;&lt;SPAN class="undefined" data-processed="true"&gt;, &lt;/SPAN&gt;&lt;SPAN class="CS0cqb" data-processed="true"&gt;"false&lt;/SPAN&gt;&lt;SPAN class="CS0cqb" data-processed="true"&gt;"&lt;/SPAN&gt;&lt;SPAN class="undefined" data-processed="true"&gt;)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;or&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Use Explicit Wildcard Instead of Folder&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="ͼe"&gt;s3&lt;/SPAN&gt;&lt;SPAN&gt;://&lt;/SPAN&gt;&lt;SPAN class="ͼe"&gt;bucket&lt;/SPAN&gt;&lt;SPAN class="ͼ8"&gt;/&lt;/SPAN&gt;&lt;SPAN class="ͼe"&gt;folder&lt;/SPAN&gt;&lt;SPAN class="ͼ8"&gt;/*.&lt;/SPAN&gt;&lt;SPAN&gt;csv&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Feb 2026 15:05:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-auto-loader-reading-from-parent-s3-folder-not-sub-folders/m-p/149188#M53035</guid>
      <dc:creator>Saritha_S</dc:creator>
      <dc:date>2026-02-24T15:05:46Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Auto Loader Reading from Parent S3 Folder not Sub Folders</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-auto-loader-reading-from-parent-s3-folder-not-sub-folders/m-p/149317#M53072</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/632"&gt;@Saritha_S&lt;/a&gt;&amp;nbsp;for your prompt feedback and support. Suggested option worked for me.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Feb 2026 23:14:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-auto-loader-reading-from-parent-s3-folder-not-sub-folders/m-p/149317#M53072</guid>
      <dc:creator>FAHADURREHMAN</dc:creator>
      <dc:date>2026-02-25T23:14:03Z</dc:date>
    </item>
    <item>
      <title>Hi @FAHADURREHMAN, This is expected behavior with Auto Lo...</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-auto-loader-reading-from-parent-s3-folder-not-sub-folders/m-p/150349#M53384</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/215264"&gt;@FAHADURREHMAN&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;This is expected behavior with Auto Loader. By default, when you point it at a directory path like s3://bucket/folder/, it will recursively traverse all subdirectories and pick up matching files. The pathGlobFilter option only filters by file name pattern, it does not prevent Auto Loader from descending into subfolders.&lt;/P&gt;
&lt;P&gt;You have two options to restrict reading to only the top-level folder:&lt;/P&gt;
&lt;P&gt;OPTION 1: SET recursiveFileLookup TO FALSE&lt;/P&gt;
&lt;P&gt;Add this option to your configuration dictionary:&lt;/P&gt;
&lt;PRE&gt;"recursiveFileLookup": "false"&lt;/PRE&gt;
&lt;P&gt;So your options dict would include:&lt;/P&gt;
&lt;PRE&gt;options = {
  "cloudFiles.format": "csv",
  "cloudFiles.schemaLocation": SCHEMA_LOCATION,
  "cloudFiles.inferColumnTypes": "true",
  "cloudFiles.schemaEvolutionMode": "addNewColumns",
  "cloudFiles.includeExistingFiles": "true",
  "cloudFiles.useNotifications": "false",
  "recursiveFileLookup": "false",
  "pathGlobFilter": "*.csv",
  "header": "true",
  "delimiter": ",",
  "quote": "\"",
  "multiLine": "false",
  "badRecordsPath": f"{SCHEMA_LOCATION}/bad_records",
  "columnNameOfCorruptRecord": "_corrupt_record",
  "cloudFiles.rescuedDataColumn": "_rescued_data",
}&lt;/PRE&gt;
&lt;P&gt;When recursiveFileLookup is set to false, Auto Loader will only discover files in the immediate directory you specify, ignoring any subdirectories.&lt;/P&gt;
&lt;P&gt;OPTION 2: USE A WILDCARD PATH INSTEAD OF A DIRECTORY&lt;/P&gt;
&lt;P&gt;Instead of pointing to the folder:&lt;/P&gt;
&lt;PRE&gt;src_path = "s3://your-bucket/your-folder/"&lt;/PRE&gt;
&lt;P&gt;Use a wildcard that matches only top-level CSV files:&lt;/P&gt;
&lt;PRE&gt;src_path = "s3://your-bucket/your-folder/*.csv"&lt;/PRE&gt;
&lt;P&gt;This tells Auto Loader to only pick up files matching *.csv directly under that path, without descending into subfolders.&lt;/P&gt;
&lt;P&gt;Either approach will work. Option 1 is generally the cleaner solution when using Lakeflow Spark Declarative Pipelines (SDP), since it keeps path handling simple and the behavior is controlled explicitly through configuration. Note that what was previously called DLT is now named Lakeflow Spark Declarative Pipelines (SDP).&lt;/P&gt;
&lt;P&gt;For reference, the full list of Auto Loader options is documented here:&lt;BR /&gt;
&lt;A href="https://docs.databricks.com/aws/ingestion/cloud-object-storage/auto-loader/options" target="_blank"&gt;https://docs.databricks.com/aws/ingestion/cloud-object-storage/auto-loader/options&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;
&lt;P&gt;If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Mar 2026 05:54:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-auto-loader-reading-from-parent-s3-folder-not-sub-folders/m-p/150349#M53384</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-09T05:54:14Z</dc:date>
    </item>
  </channel>
</rss>

