Databricks Community

sanjay · ‎03-02-2023

Hi,

I am running autoloader which is running continuously and checks for new file every 1 minute. I need to store when file was received/processed but its giving me date when autoloader started.

Here is my code.

df = (spark

.readStream

.format("cloudFiles")

.option("cloudFiles.format", "json")

.option("cloudFiles.includeExistingFiles", "true")

.option("cloudFiles.validateOptions", "true")

.option("cloudFiles.region", "us-east-1")

.option("cloudFiles.backfillInterval", "1 day")

.option("cloudFiles.fetchParallelism", 100)

.option("cloudFiles.useNotifications", "true")

.schema(streamSchema)

.load(raw_path)

.withColumn('process_date',lit(date.today()))

)

(df

.writeStream

.format("delta")

.outputMode("append")

.option("checkpointLocation", bronze_checkpoint_path)

.option("path", bronze_path)

.option("mergeSchema", True)

.trigger(processingTime="1 minute") # or set this to whatever makes sense to the data source

.start()

)

Appreciate any help.

Regards,

Sanjay

Lakshay · ‎03-02-2023

Hi @Sanjay Jain , Currently we don't have a way to delete the files automatically. However, we are working on a feature called "CleanSource" which will do this. Currently, it is in private preview. You can explore that option.

Or the other way is to develop a small code that uses the file metadata column information to delete the files periodically.

View solution in original post

Lakshay · ‎03-02-2023

Hi @Sanjay Jain , You can use the File Metadata column functionality to collect that information.

Ref doc:- https://docs.databricks.com/ingestion/file-metadata-column.html

sanjay · ‎03-02-2023

Thank you Lakshay. Its helpful.

Another query related to autoloader

How to delete files automatically once its processed successfully.

Regards,

Sanjay

Lakshay · ‎03-02-2023

Hi @Sanjay Jain , Currently we don't have a way to delete the files automatically. However, we are working on a feature called "CleanSource" which will do this. Currently, it is in private preview. You can explore that option.

Or the other way is to develop a small code that uses the file metadata column information to delete the files periodically.

sanjay · ‎03-02-2023

Thank you Lakshay.

Databricks Community

How can I get date when autoloader processes the file

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟