cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Live Tables: How does it identify new files?

dbuschi
New Contributor

Hi,
I'm importing large numbers of parquet files (ca 5200 files per day, they each land in a separate folder) into Azure ADLS storage.
I have a DLT streaming table reading from the root folder.
I noticed a massive spike in storage account costs due to file system reads.
Questions: How does DLT identify newly arriving files? Does it always have to monitor the entire folder including all historical files?
Are there any design patterns to resolve this (i.e regarding folder structure, archiving of processed files)?
Many thanks for your help!

1 REPLY 1

SparkJun
Databricks Employee
Databricks Employee

Please refer to the autoloader for details https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/ You can use autoloader in DLT to detect new files. Our document also mentions the file name patterns that work with the autoloader. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group