AutoLoader - handle spark write transactional (_SUCCESS file) on ADLS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-05-2024 06:01 AM
Spark write method (df.write.parquet) to parquet files is transactional. I mean after write is sucessfull file _SUCCESS is created in path where parquet files was loaded.
Is it possible to configure AutoLoader to load parquet files only in case when write is done with success (_SUCCESS file was appeared) ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-12-2024 08:30 AM
I think my question wasn't understood correctly. I meant AutoLoader as the data loading tool provided by Databricks (https://docs.databricks.com/en/ingestion/auto-loader/index.html).
AutoLoader has set of different options to setup (https://docs.databricks.com/en/ingestion/auto-loader/options.html) but I don't find any option to help me achive resault which I described in this topc. Any ideas how to resolve my problem?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2024 06:14 AM
@Marcin_U Please use the below option in the readStream to load only parquet files
.option("pathGlobfilter", "*.parquet")
Please refer to the below documentation: