08-05-2025 07:18 AM
Hi there,
I am likely misunderstanding how to use AutoLoader properly while developing/testing. I am trying to write a simple AutoLoader notebook cell to *read* the contents of a path with json files, and *write* them to console (i.e. notebook cell) in order to visualize the results. I kicked this off yesterday before logging off, and when I logged back in the morning, I realized that the cell was running for nearly 16 hours!
Can I get some assistance to understand what I'm doing wrong? I don't want to setup a permanent or long running data stream currently. At this time, I only have a filepath with a very small number of files (less than 10 with some few files being occasionally manually added), and I want to be able to easily view the contents of the files without requiring a permanent or long-running stream.
08-05-2025 08:13 AM
Hi @ChristianRRL ,
It looks like spark.readStream with Auto Loader creates a continuous streaming job by default, which means it keeps running while waiting for new files.
To avoid this, you can control the behaviour using trigger(availableNow=True), which processes all data available at the start, but may break the work into multiple micro-batches.
08-05-2025 08:58 AM
Hi @ChristianRRL ,
This is expected behavior. Under the hood autoloader uses spark structured streaming. In spark structured streaming you can't use display.
It would be beneficial for you to familiarize yourself with structured streaming concept. It is whole different world than traditional batch approach, so hence your confusion:
08-05-2025 08:13 AM
Hi @ChristianRRL ,
It looks like spark.readStream with Auto Loader creates a continuous streaming job by default, which means it keeps running while waiting for new files.
To avoid this, you can control the behaviour using trigger(availableNow=True), which processes all data available at the start, but may break the work into multiple micro-batches.
08-05-2025 08:32 AM
Fantastic! This is a great step forward, just one more thing. The trigger(availableNow=True) worked as you said, but I'm still not seeing the data displaying in the notebook cell. Is there something else I'm missing?
08-05-2025 08:58 AM
Hi @ChristianRRL ,
This is expected behavior. Under the hood autoloader uses spark structured streaming. In spark structured streaming you can't use display.
It would be beneficial for you to familiarize yourself with structured streaming concept. It is whole different world than traditional batch approach, so hence your confusion:
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now