cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

[Auto Loader] Inquiry regarding Checkpoint files

ha2hi
Visitor

Hi,

I am currently using Auto Loader to load files stored in the cloud into Databricks tables. I understand that checkpoint files are continuously generated during this process.

I have a couple of questions regarding these files:

  • Do these checkpoint files continue to accumulate indefinitely over time?

  • Is there a way to compress or delete them periodically?

I look forward to hearing from you. Best regards.

1 REPLY 1

balajij8
Contributor III

Never delete or alter files inside a checkpoint directory manually as it will corrupt the auto loader streams.

Auto Loader keeps track of discovered files in the checkpoint location using Rocks DB to provide exactly once ingestion guarantees.

  • You can upgrade to Databricks Runtime 17 or above for high volume or long-lived ingestion streams.
  • You can control the size using the cloudFiles.maxFileAge option to expire file events that are older than a particular period. You can keep it to 30 days if possible.
  • You can use Auto Loaderโ€™s cleanSource option. This deletes or archives the source files after they are successfully processed