cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Autoloader Error Loading and Displaying

ChristianRRL
Valued Contributor II

Hi there,

I'd appreciate some assistance with troubleshooting what is supposed to be a (somewhat) simple use of autoloader. Below are some screenshots highlighting my issue:

When I attempt to create the dataframe via spark.readStream.format("cloudFiles"), a dataframe with the correct nested structure seems to be created, but when I attempt to run display on the dataframe, I get the following error message:

Error while trying to fetch latest data. Please check Driver logs.

 I've tried checking the logs, but to be honest they're not very clear.

ChristianRRL_0-1750702687568.png

ChristianRRL_1-1750702720386.png

 

1 REPLY 1

lingareddy_Alva
Honored Contributor II

Hi @ChristianRRL 

 

This is a common issue with Spark Structured Streaming and the display() function.
The error occurs because you're trying to display a streaming DataFrame, which requires special handling. Here are several solutions:

1. Use writeStream instead of display()
For streaming DataFrames, use writeStream to output the data:

# Instead of display(df)
query = (df.writeStream
.format("console") # or "memory", "delta", etc.
.outputMode("append") # or "complete", "update"
.trigger(once=True) # Process once then stop
.start())

query.awaitTermination()


2. Use Memory Sink for Testing:
Create a temporary view to examine streaming data:
# Start the stream writing to memory
query = (df.writeStream
.format("memory")
.queryName("temp_table")
.outputMode("append")
.start())

# Wait a moment for data to be processed
import time
time.sleep(10)

# Now you can query the in-memory table
display(spark.sql("SELECT * FROM temp_table LIMIT 10"))

# Don't forget to stop the query
query.stop()


The key issue is that display() doesn't work with streaming DataFrames - you need to use writeStream to materialize the data first.

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now