Autoloader move file to archive immediately after processing
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-02-2025 01:40 AM
Hi,
We are using autoloader with spark streaming (Databricks: file detection mode) and Want to move files to archive folder from source immediately after processing file. But I cannot reduce retention window beyond 7 days.
Code:
.option("cloudFiles.cleanSource", "move")
.option("cloudFiles.cleanSource.moveDestination", archive_path_monthly)
.option("cloudFiles.cleanSource.retentionDuration", "interval 1 minutes")
Do suggest alternate way to achieve the same
Note: I dont want to do this job via code manualy but I want to configure this with autoloader.
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-02-2025 03:00 AM
cloudFiles.cleanSource.retentionDuration
Type: Interval String
Amount of time to wait before processed files become candidates for archival with cleanSource. Must be greater than 7 days for DELETE. No minimum restriction for MOVE.
Available in Databricks Runtime 16.4 and above. Default value: 30 days
Alternative Solutions :
1. Use Azure storage lifecycle policy.
2. Create a databricks jobs with autocleanup(A delta log tracker is required to make sure the files are processed before moving).
3. Use Azure event grid to trigger a movement operation on file as ingested and an Azure function to listen to the files in the source directory and moves them immediately after autoloader ingestion.
4. Instead of moving files, we can also use databricks external location where the source folder is mapped to a temp directory, and an Azure storage tiering rule automatically moves the files after autoloader ingestion.
Vaibhav Sharma
Databricks Certified Professional
Microsoft Azure Certified Professional
Microsoft Certified Trainer
Databricks Certified Professional
Microsoft Azure Certified Professional
Microsoft Certified Trainer