cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Autoloader - File Notification mode

Gilg
Contributor II

Hi All,

I have set up a DLT pipeline that is using Autoloader in a file notification mode.

Everything runs smoothly for the first time. However, it seems like the next micro-batch did not trigger as I can see some events coming in the queue.

Gilg_0-1710827649089.png


But if I look at SparkUI I do not see any active jobs for a while now.

Gilg_1-1710827662118.png

Not sure what is happening here.

Cheers,

Gil

 

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @Gilg , 

  • If you’re not seeing any active jobs in SparkUI, let’s explore a few possibilities:
    • Job Submission: Ensure that your Spark job submission is correctly configured and that the pipeline is actively submitting jobs.
    • Resource Allocation: Check if there are sufficient resources (CPU, memory, etc.) allocated to your Spark application. Insufficient resources can lead to job starvation.
    • Stuck Jobs: Sometimes, jobs may get stuck due to various reasons (e.g., resource contention, data skew, or inefficient transformations). Look for any stuck or long-running tasks.
    • Logging and Debugging: Review the logs and debug any errors or warnings related to your pipeline.
    • Checkpointing: If your pipeline uses checkpointing, verify that it’s working as expected.
    • Data Availability: Ensure that the events you’re expecting are indeed available in the input directory.
    • File Permissions: Check if the Spark user has the necessary permissions to access the input directory and read the files.
    • Network Issues: Verify that there are no network issues preventing communication between Spark and the file system.
    • Driver/Executor Failures: Investigate if there are any driver or executor failures that might be causing the lack of active jobs.
    • If you recently changed the source path for Autoloader, note that changing the source path is not supported for file notification mode. In such cases, you might fail to ingest files that are already present in the new directory at the t...1.
    • Make sure you have the necessary elevated permissions to automatically configure cloud infrastructure for file notification mode. 
  • If you encounter any specific errors or need further assistance, feel free to share additional details! 🚀 
    •  

Hi @Kaniz_Fatma 

I did some digging on the messages that we are receiving.

 

By default, autoloader generates the Event Grid System Topic, Event Subscriptions and Storage Queue endpoint.

 

Gilg_0-1710897237186.png

 

Gilg_6-1710897546457.png

 

Looking at the queue endpoint it has a filter that is set automatically below.

 

Gilg_2-1710897252022.png

 

 

In our test we have removed this filter and see what messages we are getting.

 

We’ve noticed that messages that we are receiving in the Storage Queue only have a tag of “CreateFile”.

Gilg_7-1710897593599.png

 

But autoloader seems to be listening to different api tags according to this below.

 

Gilg_4-1710897271297.png

 

 

 

I think that it could be the reason why we do not get any Active jobs in SparkUI because Autoloader is looking to different api tags to process.

Not sure why this is happening.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!