Hi there, the file arrival trigger seems handy, but I have questions about the performance and cost implications of using it. Per file arrival trigger documentation:
"File arrival triggers do not incur additional costs other than cloud provider costs associated with listing files in the storage location."
This is potentially concerning. For example, let's say we have a data extraction pipeline that on a given year loads 100k .json files to a landing path. If we are using the file arrival trigger to monitor when files arrive (e.g. checks every minute), then this would mean that when there is a new file, all other 100k files would still need to be scanned/listed in order to acquire only the new file, incurring both a cost and performance impact. Worst still, whether there is a new file or not, this file scan/listing is done every minute, so regardless of there being new data we would still be incurring compute costs due to the file listing operation.
I would like some assistance to understand if my above example/assumptions are correct. If so, can I get some help to understand in what context does it make sense to leverage a file arrival trigger? Or else, if my example/assumptions are incorrect, please let me know how so!