- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2023 05:48 AM
Hi,
I am running autoloader which is running continuously and checks for new file every 1 minute. I need to store when file was received/processed but its giving me date when autoloader started.
Here is my code.
df = (spark
.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.includeExistingFiles", "true")
.option("cloudFiles.validateOptions", "true")
.option("cloudFiles.region", "us-east-1")
.option("cloudFiles.backfillInterval", "1 day")
.option("cloudFiles.fetchParallelism", 100)
.option("cloudFiles.useNotifications", "true")
.schema(streamSchema)
.load(raw_path)
.withColumn('process_date',lit(date.today()))
)
(df
.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation", bronze_checkpoint_path)
.option("path", bronze_path)
.option("mergeSchema", True)
.trigger(processingTime="1 minute") # or set this to whatever makes sense to the data source
.start()
)
Appreciate any help.
Regards,
Sanjay
- Labels:
-
Autoloader
-
Date
-
Sanjay
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2023 11:06 AM
Hi @Sanjay Jain , Currently we don't have a way to delete the files automatically. However, we are working on a feature called "CleanSource" which will do this. Currently, it is in private preview. You can explore that option.
Or the other way is to develop a small code that uses the file metadata column information to delete the files periodically.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2023 06:55 AM
Hi @Sanjay Jain , You can use the File Metadata column functionality to collect that information.
Ref doc:- https://docs.databricks.com/ingestion/file-metadata-column.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2023 09:10 AM
Thank you Lakshay. Its helpful.
Another query related to autoloader
- How to delete files automatically once its processed successfully.
Regards,
Sanjay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2023 11:06 AM
Hi @Sanjay Jain , Currently we don't have a way to delete the files automatically. However, we are working on a feature called "CleanSource" which will do this. Currently, it is in private preview. You can explore that option.
Or the other way is to develop a small code that uses the file metadata column information to delete the files periodically.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2023 09:47 PM
Thank you Lakshay.

