Databricks Community

Rik · ‎08-07-2023

We are using the UC mechanism for triggering jobs on file arrival, as described here: https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers.

Unfortunately, the trigger doesn't actually pass the file-path that is generating the trigger to the job... (The Run Parameters are empty). Is there any way to get this information?

Tharun-Kumar · ‎08-08-2023

@Rik

We have got this request from other Customers too. Our Engineering team is already notified of this and there is an internal ticket for the same. But we don't have an ETA for now.

View solution in original post

Tharun-Kumar · ‎08-07-2023

@Rik

For now, we do not send the file details as part of the trigger. The trigger is used to run a pipeline.

Alternately, You can use autoloader as part of the triggered pipeline to get the details of the file that arrived.

Rik · ‎08-08-2023

"Alternately, You can use autoloader as part of the triggered pipeline to get the details of the file that arrived."

That doesn't quite fit our requirements unfortunately... Are there any plans on adding this functionality?

mattiazeni · ‎10-23-2024

What do you need to achieve?

Autoloader is much more efficient since it can handle a bunch of files (only new ones) in a single operation. Handling file by file, especially with a lot of files, will increase latency and increase costs.

marcuskw · ‎10-23-2024

What I wanted to achieve was a dynamic schema application based on what file was picked up.
So I implement 1 autoloader task to collect files from a specific path "source":
- source/employees/0001.csv
- source/holiday/0001.csv

If the path of the file was available I could then apply the relevant schema in runtime.
But autoloader may want to process both files and put them into the same dataframe?
Maybe this isn't the best usecase, I guess you would recommend to implement multiple tasks/checkpoints for the respective folders?

Tharun-Kumar · ‎08-08-2023

@Rik

We have got this request from other Customers too. Our Engineering team is already notified of this and there is an internal ticket for the same. But we don't have an ETA for now.

srsnarendran · ‎09-23-2024

Any ETA ? We are having to use other Orchestration products because of this limitation.

Panda · ‎02-15-2024

Could you please provide an update on the status of this particular request? Additionally, do we have any ETA for it?

marcuskw · ‎06-05-2024

Also something I'm interested in using, would be really helpful to use File Trigger and get relevant information about exactly what file triggered the workflow!

artemich · ‎09-23-2024

Same here!

Additionally would be great to enhance it to support not just the path to a directory, but also the prefix of the file name (or regex for bonus points). Right now if you have 10 types of files arriving to the same folder, it would be much cleaner to have each workflow handling a given type only process the relevant file arrived.

marcuskw · ‎09-23-2024

You are able to provide filter options to select only relevant files:
https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/patterns.html#filtering-di...

artemich · ‎09-24-2024

For loading file with AutoLoader - for sure. My wish is to have similar capability for File Arrival Trigger.

"A file arrival trigger can be configured to monitor the root of a Unity Catalog external location or volume, or a subpath of an external location or volume."
https://learn.microsoft.com/en-us/azure/databricks/jobs/file-arrival-triggers

Quite often files for multiple data entities (or even pipelines) land in the same directories from a given provider and it would be great to be able to easily manage such scenarios.