AutoLoader - handle spark write transactional (_SUCCESS file) on ADLS

Marcin_U
New Contributor II

Spark write method (df.write.parquet) to parquet files is transactional. I mean after write is sucessfull file _SUCCESS is created in path where parquet files was loaded.

Marcin_U_0-1709647032623.png

Is it possible to configure AutoLoader to load parquet files only in case when write is done with success (_SUCCESS file was appeared) ?

Marcin_U
New Contributor II

I think my question wasn't understood correctly. I meant AutoLoader as the data loading tool provided by Databricks (https://docs.databricks.com/en/ingestion/auto-loader/index.html).

AutoLoader has set of different options to setup (https://docs.databricks.com/en/ingestion/auto-loader/options.html) but I don't find any option to help me achive resault which I described in this topc. Any ideas how to resolve my problem?

PotnuruSiva
Databricks Employee
Databricks Employee

@Marcin_U Please use the below option in the readStream to load only parquet files

.option("pathGlobfilter", "*.parquet") 

Please refer to the below documentation:

https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/options.html#:~:text=Defau...

https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/patterns.html#:~:text=opti...