Databricks Community

ChristianRRL · ‎08-05-2025

Hi there,

I am likely misunderstanding how to use AutoLoader properly while developing/testing. I am trying to write a simple AutoLoader notebook cell to *read* the contents of a path with json files, and *write* them to console (i.e. notebook cell) in order to visualize the results. I kicked this off yesterday before logging off, and when I logged back in the morning, I realized that the cell was running for nearly 16 hours!

Can I get some assistance to understand what I'm doing wrong? I don't want to setup a permanent or long running data stream currently. At this time, I only have a filepath with a very small number of files (less than 10 with some few files being occasionally manually added), and I want to be able to easily view the contents of the files without requiring a permanent or long-running stream.

SP_6721 · ‎08-05-2025

Hi @ChristianRRL ,

It looks like spark.readStream with Auto Loader creates a continuous streaming job by default, which means it keeps running while waiting for new files.

To avoid this, you can control the behaviour using trigger(availableNow=True), which processes all data available at the start, but may break the work into multiple micro-batches.

View solution in original post

szymon_dybczak · ‎08-05-2025

Hi @ChristianRRL ,

This is expected behavior. Under the hood autoloader uses spark structured streaming. In spark structured streaming you can't use display.

It would be beneficial for you to familiarize yourself with structured streaming concept. It is whole different world than traditional batch approach, so hence your confusion:

https://spark.apache.org/docs/latest/streaming/index.html

View solution in original post

SP_6721 · ‎08-05-2025

Hi @ChristianRRL ,

It looks like spark.readStream with Auto Loader creates a continuous streaming job by default, which means it keeps running while waiting for new files.

To avoid this, you can control the behaviour using trigger(availableNow=True), which processes all data available at the start, but may break the work into multiple micro-batches.

ChristianRRL · ‎08-05-2025

Fantastic! This is a great step forward, just one more thing. The trigger(availableNow=True) worked as you said, but I'm still not seeing the data displaying in the notebook cell. Is there something else I'm missing?

szymon_dybczak · ‎08-05-2025

Hi @ChristianRRL ,

This is expected behavior. Under the hood autoloader uses spark structured streaming. In spark structured streaming you can't use display.

It would be beneficial for you to familiarize yourself with structured streaming concept. It is whole different world than traditional batch approach, so hence your confusion:

https://spark.apache.org/docs/latest/streaming/index.html

Databricks Community

AutoLoader - Write To Console (Notebook Cell) Long Running Issue

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog