03-02-2023 05:48 AM
Hi,
I am running autoloader which is running continuously and checks for new file every 1 minute. I need to store when file was received/processed but its giving me date when autoloader started.
Here is my code.
df = (spark
.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.includeExistingFiles", "true")
.option("cloudFiles.validateOptions", "true")
.option("cloudFiles.region", "us-east-1")
.option("cloudFiles.backfillInterval", "1 day")
.option("cloudFiles.fetchParallelism", 100)
.option("cloudFiles.useNotifications", "true")
.schema(streamSchema)
.load(raw_path)
.withColumn('process_date',lit(date.today()))
)
(df
.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation", bronze_checkpoint_path)
.option("path", bronze_path)
.option("mergeSchema", True)
.trigger(processingTime="1 minute") # or set this to whatever makes sense to the data source
.start()
)
Appreciate any help.
Regards,
Sanjay
03-02-2023 11:06 AM
Hi @Sanjay Jain , Currently we don't have a way to delete the files automatically. However, we are working on a feature called "CleanSource" which will do this. Currently, it is in private preview. You can explore that option.
Or the other way is to develop a small code that uses the file metadata column information to delete the files periodically.
03-02-2023 06:55 AM
Hi @Sanjay Jain , You can use the File Metadata column functionality to collect that information.
Ref doc:- https://docs.databricks.com/ingestion/file-metadata-column.html
03-02-2023 09:10 AM
Thank you Lakshay. Its helpful.
Another query related to autoloader
Regards,
Sanjay
03-02-2023 11:06 AM
Hi @Sanjay Jain , Currently we don't have a way to delete the files automatically. However, we are working on a feature called "CleanSource" which will do this. Currently, it is in private preview. You can explore that option.
Or the other way is to develop a small code that uses the file metadata column information to delete the files periodically.
03-02-2023 09:47 PM
Thank you Lakshay.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group