- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2024 02:02 AM
We are using Databricks in combination with Azure platforms, specifically working with Azure Blob Storage (Gen2). We frequently mount Azure containers in the Databricks file system and leverage external locations and volumes for Azure containers.
Our use case involves building several data pipelines in Databricks, and we are currently facing an issue with setting up a file arrival trigger. The goal is to trigger a workflow whenever a new file is dropped into an Azure Blob Storage container (Gen2), and we need to pass the complete file path to the subsequent processor in the workflow.
We would appreciate guidance on how to:
- Set up a file arrival trigger in Databricks for Azure Blob Storage (Gen2).
- Capture the file path and file name that triggered the event and pass it as a parameter to the next task in the pipeline.
Any advice or best practices to solve this issue would be greatly appreciated!
Thank you for your time and assistance.
Best regards,
Baburam Shrestha
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-14-2024 04:29 PM
We inquired about this a few days ago and checked with Databricks. They were working on the issue, but no ETA was provided. You can find more details here: Databricks Community Link.
However, to address this use case, we followed the steps below:
- Configure Autoloader with Directory Listing. [ P:S:- use trigger(availableNow=True) ]
- Capture the File Path: Use the _metadata column to capture the file path of the newly arrived file:
df_with_path = df.withColumn("input_file_path", input_file_name()) - Pass the File Path to the Next Task: Once the file path is captured, pass it to the next task in the pipeline using the appropriate workflow or task parameter mechanism.
I hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-14-2024 04:29 PM
We inquired about this a few days ago and checked with Databricks. They were working on the issue, but no ETA was provided. You can find more details here: Databricks Community Link.
However, to address this use case, we followed the steps below:
- Configure Autoloader with Directory Listing. [ P:S:- use trigger(availableNow=True) ]
- Capture the File Path: Use the _metadata column to capture the file path of the newly arrived file:
df_with_path = df.withColumn("input_file_path", input_file_name()) - Pass the File Path to the Next Task: Once the file path is captured, pass it to the next task in the pipeline using the appropriate workflow or task parameter mechanism.
I hope this helps.

