cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Autoloader exclude one directory

Nathant93
New Contributor II

Hi,

I have a bunch of csv files in directories within an azure blob container and I am using autoloader to ingest them into a raw (bronze) table, all csvs apart from one have the same schema. Is there a way to get autoloader to ignore the directory with the one csv that has a different schema to the rest of them?

Thanks

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Nathant93

  • You can use the pathGlobFilter option to filter files based on a regular expression. For instance, if you want to skip files with filenames like A1.csvA2.csv, โ€ฆ, A9.csv, you can specify the filter as follows:
  • df = spark.read.load("/file/load/location", format="csv", schema=schema, pathGlobFilter="A[0-9].csv")
    
    • Adjust the regex pattern according to your specific use case1.
    • If you provide a schema for Auto Loader, it expects the specified columns to be included in that schema.
    • To ignore specific columns that exist in some CSV files but not others, you can set those columns to an empty string ("") in the schema. This effectively excludes them from the schema.
    • For example:
       
    • schema = "col1 STRING, col2 STRING, col3 STRING, col_to_ignore STRING"
      df = spark.read.load("/file/load/location", format="csv", schema=schema)
      
  • In this case, the col_to_ignore will be ignored when reading the CSV files.
  • Auto Loader can infer schemas from the data files. If your CSV files do not contain headers, provide the option option("header", "false").
  • Auto Loader stores schema information in a directory called _schemas at the configured cloudFiles.schemaLocation. This allows tracking schema changes over time.
  • To adjust the sample size used for schema inference, set the SQL configuration spark.databricks.clou...
  •  
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.