Databricks Community

databricks_use2 · ‎07-16-2025

I'm encountering an issue with Autoloader where it fails to process certain files due to specific characters in their names. For example, files that begin with an underscore (e.g., _data_etc.).json) are ignored and not processed. After some investigation, I found that Spark ignores files starting with a leading _ or . by default. However, I need to include these files in my processing pipeline. Is there a way to configure Autoloader to include such files?

Additionally, I'm facing another issue with certain file paths, such as s3://abc/https://some_folder/xyz. Autoloader throws error in this case saying file not found. Is there a way to either process such paths or configure Autoloader to completely ignore folders with malformed or nested paths like these?

ilir_nuredini · ‎07-17-2025

Hello @databricks_use2 ,

I don't think there is an easy way to do this. The hiddenFileFilter property is always active, and this is not just specific to Autoloader. And you may actually break very basic functionality, like reading Delta tables (as you will go inside hidden files). I suggest you employ a rename job and then read.

Hope that helps,

Best, Ilir

szymon_dybczak · ‎07-17-2025

I'm agree with @ilir_nuredini . It's better to change source file naming convention than to try to bypass the hidden file filter. Especially when working with Delta Lake, since internal metadata and transaction logs are also stored in hidden files and folders.

Renjithrk · ‎07-17-2025

I am just giving my suggestions
By default, Spark and Autoloader skip hidden files (those starting with _ or .). To include these in the Autoloader pipeline, use the following option: option("cloudFiles.includeHiddenFiles", "true")

szymon_dybczak · ‎07-17-2025

Hi @Renjithrk ,

There is no such an option in autoloader. Is it undocumented one or is this something suggested by chat gpt? 😄

Auto Loader options - Azure Databricks | Microsoft Learn

ilir_nuredini · ‎07-17-2025

Thats right @szymon_dybczak 😄

ilir_nuredini · ‎07-17-2025

Hello @Renjithrk ,

I don't seem to find this option in any documentation. So this option is not available in the cloudFiles.
You can check this link to see all available cloudFiles options: https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options

Best, Ilir