cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

AutoLoader - handle spark write transactional (_SUCCESS file) on ADLS

Marcin_U
New Contributor II

Spark write method (df.write.parquet) to parquet files is transactional. I mean after write is sucessfull file _SUCCESS is created in path where parquet files was loaded.

Marcin_U_0-1709647032623.png

Is it possible to configure AutoLoader to load parquet files only in case when write is done with success (_SUCCESS file was appeared) ?

2 REPLIES 2

Marcin_U
New Contributor II

I think my question wasn't understood correctly. I meant AutoLoader as the data loading tool provided by Databricks (https://docs.databricks.com/en/ingestion/auto-loader/index.html).

AutoLoader has set of different options to setup (https://docs.databricks.com/en/ingestion/auto-loader/options.html) but I don't find any option to help me achive resault which I described in this topc. Any ideas how to resolve my problem?

PotnuruSiva
Databricks Employee
Databricks Employee

@Marcin_U Please use the below option in the readStream to load only parquet files

.option("pathGlobfilter", "*.parquet") 

Please refer to the below documentation:

https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/options.html#:~:text=Defau...

https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/patterns.html#:~:text=opti...  

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group