Databricks Community

ShresthaBaburam · ‎10-07-2024

We are using Databricks in combination with Azure platforms, specifically working with Azure Blob Storage (Gen2). We frequently mount Azure containers in the Databricks file system and leverage external locations and volumes for Azure containers.

Our use case involves building several data pipelines in Databricks, and we are currently facing an issue with setting up a file arrival trigger. The goal is to trigger a workflow whenever a new file is dropped into an Azure Blob Storage container (Gen2), and we need to pass the complete file path to the subsequent processor in the workflow.

We would appreciate guidance on how to:

Set up a file arrival trigger in Databricks for Azure Blob Storage (Gen2).
Capture the file path and file name that triggered the event and pass it as a parameter to the next task in the pipeline.

Any advice or best practices to solve this issue would be greatly appreciated!

Thank you for your time and assistance.

Best regards,
Baburam Shrestha

ShresthaBaburam

Panda · ‎10-14-2024

@ShresthaBaburam

We inquired about this a few days ago and checked with Databricks. They were working on the issue, but no ETA was provided. You can find more details here: Databricks Community Link.

However, to address this use case, we followed the steps below:

Configure Autoloader with Directory Listing. [ P:S:- use trigger(availableNow=True) ]
Capture the File Path: Use the _metadata column to capture the file path of the newly arrived file:
df_with_path = df.withColumn("input_file_path", input_file_name())
Pass the File Path to the Next Task: Once the file path is captured, pass it to the next task in the pipeline using the appropriate workflow or task parameter mechanism.

I hope this helps.

View solution in original post

Panda · ‎10-14-2024

@ShresthaBaburam

We inquired about this a few days ago and checked with Databricks. They were working on the issue, but no ETA was provided. You can find more details here: Databricks Community Link.

However, to address this use case, we followed the steps below:

Configure Autoloader with Directory Listing. [ P:S:- use trigger(availableNow=True) ]
Capture the File Path: Use the _metadata column to capture the file path of the newly arrived file:
df_with_path = df.withColumn("input_file_path", input_file_name())
Pass the File Path to the Next Task: Once the file path is captured, pass it to the next task in the pipeline using the appropriate workflow or task parameter mechanism.

I hope this helps.

Databricks Community

File Arrival Trigger in Azure Databricks

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟