<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Autolader and files with invalid path in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125553#M47472</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175432"&gt;@databricks_use2&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I don't think there is an easy way to do this. The hiddenFileFilter property is always active, and this is not just specific to Autoloader. And you may actually break&amp;nbsp;very basic functionality, like reading Delta tables (as you will go inside hidden files). I suggest you employ a rename job and then read.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Hope that helps,&lt;BR /&gt;&lt;BR /&gt;Best, Ilir&lt;/P&gt;</description>
    <pubDate>Thu, 17 Jul 2025 09:11:39 GMT</pubDate>
    <dc:creator>ilir_nuredini</dc:creator>
    <dc:date>2025-07-17T09:11:39Z</dc:date>
    <item>
      <title>Autolader and files with invalid path</title>
      <link>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125464#M47452</link>
      <description>&lt;P&gt;I'm encountering an issue with &lt;STRONG&gt;Autoloader&lt;/STRONG&gt; where it fails to process certain files due to specific characters in their names. For example, files that begin with an underscore (e.g., &lt;STRONG&gt;_data_etc.).json&lt;/STRONG&gt;) are ignored and not processed. After some investigation, I found that &lt;STRONG&gt;Spark ignores files starting with a leading _ or .&lt;/STRONG&gt; by default. However, I need to include these files in my processing pipeline. Is there a way to configure Autoloader to include such files?&lt;/P&gt;&lt;P&gt;Additionally, I'm facing another issue with certain file paths, such as &lt;STRONG&gt;s3://abc/&lt;A href="https://some_folder/xyz" target="_blank" rel="noopener"&gt;https://some_folder/xyz&lt;/A&gt;&lt;/STRONG&gt;. Autoloader throws error in this case saying file not found. Is there a way to either process such paths or configure Autoloader to completely ignore folders with malformed or nested paths like these?&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jul 2025 15:13:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125464#M47452</guid>
      <dc:creator>databricks_use2</dc:creator>
      <dc:date>2025-07-16T15:13:09Z</dc:date>
    </item>
    <item>
      <title>Re: Autolader and files with invalid path</title>
      <link>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125553#M47472</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175432"&gt;@databricks_use2&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I don't think there is an easy way to do this. The hiddenFileFilter property is always active, and this is not just specific to Autoloader. And you may actually break&amp;nbsp;very basic functionality, like reading Delta tables (as you will go inside hidden files). I suggest you employ a rename job and then read.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Hope that helps,&lt;BR /&gt;&lt;BR /&gt;Best, Ilir&lt;/P&gt;</description>
      <pubDate>Thu, 17 Jul 2025 09:11:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125553#M47472</guid>
      <dc:creator>ilir_nuredini</dc:creator>
      <dc:date>2025-07-17T09:11:39Z</dc:date>
    </item>
    <item>
      <title>Re: Autolader and files with invalid path</title>
      <link>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125561#M47477</link>
      <description>&lt;P&gt;I'm agree with&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/102399"&gt;@ilir_nuredini&lt;/a&gt;&amp;nbsp;. It's better to change source file naming convention than to try&amp;nbsp;to bypass the hidden file filter. Especially when working with Delta Lake, since internal metadata and transaction logs are also stored in hidden files and folders.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Jul 2025 09:56:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125561#M47477</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-17T09:56:03Z</dc:date>
    </item>
    <item>
      <title>Re: Autolader and files with invalid path</title>
      <link>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125563#M47479</link>
      <description>&lt;P&gt;I am just giving my suggestions&lt;BR /&gt;By default, Spark and Autoloader skip hidden files (those starting with _ or .). To include these in the Autoloader pipeline, use the following option: option("cloudFiles.includeHiddenFiles", "true")&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Jul 2025 10:06:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125563#M47479</guid>
      <dc:creator>Renjithrk</dc:creator>
      <dc:date>2025-07-17T10:06:54Z</dc:date>
    </item>
    <item>
      <title>Re: Autolader and files with invalid path</title>
      <link>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125564#M47480</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175210"&gt;@Renjithrk&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;There is no such an option in autoloader. Is it undocumented one or is this something suggested by chat gpt? &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/options" target="_blank"&gt;Auto Loader options - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Jul 2025 10:21:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125564#M47480</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-17T10:21:52Z</dc:date>
    </item>
    <item>
      <title>Re: Autolader and files with invalid path</title>
      <link>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125565#M47481</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175210"&gt;@Renjithrk&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;I don't seem to find this option in any documentation. So this option is not available in the cloudFiles.&lt;BR /&gt;You can check this link to see all available cloudFiles options:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options" target="_blank"&gt;https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Best, Ilir&lt;/P&gt;</description>
      <pubDate>Thu, 17 Jul 2025 10:28:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125565#M47481</guid>
      <dc:creator>ilir_nuredini</dc:creator>
      <dc:date>2025-07-17T10:28:02Z</dc:date>
    </item>
    <item>
      <title>Re: Autolader and files with invalid path</title>
      <link>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125566#M47482</link>
      <description>&lt;P&gt;Thats right&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Jul 2025 10:29:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125566#M47482</guid>
      <dc:creator>ilir_nuredini</dc:creator>
      <dc:date>2025-07-17T10:29:13Z</dc:date>
    </item>
    <item>
      <title>Re: Autolader and files with invalid path</title>
      <link>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125568#M47483</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175432"&gt;@databricks_use2&lt;/a&gt;&amp;nbsp;I'm merely echoing the responses above but it sounds like you should be renaming those files before doing anything.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Post here also supports this idea:&amp;nbsp;&lt;A href="https://community.databricks.com/t5/data-engineering/how-do-i-read-the-contents-of-a-hidden-file-in-a-spark-job/td-p/28026" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/how-do-i-read-the-contents-of-a-hidden-file-in-a-spark-job/td-p/28026&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="BS_THE_ANALYST_0-1752749589416.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/18256iFE155577711B0A59/image-size/medium?v=v2&amp;amp;px=400" role="button" title="BS_THE_ANALYST_0-1752749589416.png" alt="BS_THE_ANALYST_0-1752749589416.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All the best,&lt;BR /&gt;BS&lt;/P&gt;</description>
      <pubDate>Thu, 17 Jul 2025 10:53:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autolader-and-files-with-invalid-path/m-p/125568#M47483</guid>
      <dc:creator>BS_THE_ANALYST</dc:creator>
      <dc:date>2025-07-17T10:53:36Z</dc:date>
    </item>
  </channel>
</rss>

