08-07-2023 07:57 AM
We are using the UC mechanism for triggering jobs on file arrival, as described here: https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers.
Unfortunately, the trigger doesn't actually pass the file-path that is generating the trigger to the job... (The Run Parameters are empty). Is there any way to get this information?
08-08-2023 10:48 AM
We have got this request from other Customers too. Our Engineering team is already notified of this and there is an internal ticket for the same. But we don't have an ETA for now.
08-07-2023 10:03 PM
For now, we do not send the file details as part of the trigger. The trigger is used to run a pipeline.
Alternately, You can use autoloader as part of the triggered pipeline to get the details of the file that arrived.
08-08-2023 05:36 AM
"Alternately, You can use autoloader as part of the triggered pipeline to get the details of the file that arrived."
That doesn't quite fit our requirements unfortunately... Are there any plans on adding this functionality?
10-23-2024 02:31 AM
What do you need to achieve?
Autoloader is much more efficient since it can handle a bunch of files (only new ones) in a single operation. Handling file by file, especially with a lot of files, will increase latency and increase costs.
10-23-2024 06:24 AM
What I wanted to achieve was a dynamic schema application based on what file was picked up.
So I implement 1 autoloader task to collect files from a specific path "source":
- source/employees/0001.csv
- source/holiday/0001.csv
If the path of the file was available I could then apply the relevant schema in runtime.
But autoloader may want to process both files and put them into the same dataframe?
Maybe this isn't the best usecase, I guess you would recommend to implement multiple tasks/checkpoints for the respective folders?
08-08-2023 10:48 AM
We have got this request from other Customers too. Our Engineering team is already notified of this and there is an internal ticket for the same. But we don't have an ETA for now.
09-23-2024 09:22 AM
Any ETA ? We are having to use other Orchestration products because of this limitation.
02-15-2024 10:24 AM
Could you please provide an update on the status of this particular request? Additionally, do we have any ETA for it?
06-05-2024 01:18 AM
Also something I'm interested in using, would be really helpful to use File Trigger and get relevant information about exactly what file triggered the workflow!
09-23-2024 12:37 PM
Same here!
Additionally would be great to enhance it to support not just the path to a directory, but also the prefix of the file name (or regex for bonus points). Right now if you have 10 types of files arriving to the same folder, it would be much cleaner to have each workflow handling a given type only process the relevant file arrived.
09-23-2024 12:46 PM
You are able to provide filter options to select only relevant files:
https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/patterns.html#filtering-di...
09-24-2024 12:38 PM
For loading file with AutoLoader - for sure. My wish is to have similar capability for File Arrival Trigger.
"A file arrival trigger can be configured to monitor the root of a Unity Catalog external location or volume, or a subpath of an external location or volume."
https://learn.microsoft.com/en-us/azure/databricks/jobs/file-arrival-triggers
Quite often files for multiple data entities (or even pipelines) land in the same directories from a given provider and it would be great to be able to easily manage such scenarios.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group