cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to detect gap in filenames (Autoloader)

L1000
New Contributor III

So my files arrive at the cloud storage and I have configured an autoloader to read these files.

The files have a monotonically increasing id in their name.

How can I detect a gap and stop the DLT as soon as there is a gap?

eg.

Autoloader finds file1, ingests

Autoloader finds file2, ingests

Autoloader finds file3, ingests

Autoloader finds file5 -> file4 is missing: STOP

 

Is this possible using DLT? Or should I go for a streaming job?

1 REPLY 1

SparkJun
Databricks Employee
Databricks Employee

It doesn't seem like this can be done through the DLT autoloader. Particularly you require an automatic stop without manual intervention. You can write a custom Structured Streaming job and use a sequence-checking logic, and foreachBatch to process incoming files and detect missing IDs. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group