cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Workflow file arrival trigger - does it apply to overwritten files?

mvmiller
New Contributor III

I am exploring the use of the "file arrival" trigger for a workflow for a use case I am working on.  I understand from the documentation that it checks every minute for new files in an external location, then initiates the workflow when it detects a new file. 

What I don't know, when we say a "new file", does that include a new file that goes by the same name as a file that is already in the external location, which then updates that file by overwriting it?  In which case, the "modified at" timestamp would communicate that the file has been altered and be, for the purposes of this use case, "new"?

To give a concrete example, say I have an external location of 15 CSV files which are, the vast majority of the time, static tables.  Occasionally, one or more the CSV files will be updated with new content.  When that happens, the file name remains the same, but the underlying data within the file will be different. In other words, the file will be overwritten.  I want to know if the "file arrival" trigger would activate when this happens to any file at the external location that the trigger is set up to look at.

2 REPLIES 2

Rajani
Contributor II

Hi @mvmiller 
The  "file arrival" trigger for a workflow considers the name of the file,when the same name file was overwritten the workflow didnt triggerred.

hope I answered your question!

 

mo_moattar
New Contributor III

Hi,

We are in the same situation and what you say doesn't make sense. The Auto-loader in DLT works exactly @mvmiller described however this is a completely different behavior in the jobs and workflows. In this particular case, we need to implement an archiving structure on our landing zone if one of the feeds overwrites the old file.

The approach in Auto-loader is the correct one and doesn't need any further implementation to be done

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group