cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Structure stream : difference Unity Catalog vs Legacy

MaximeGendre
New Contributor III

Hello :),
I have noticed a regression in one of my job and I don't understand why.

%python
print("Hello 1")

def toto(df, _):
    print("Hello 2")

spark.readStream\
     .format("delta")\
      .load("/databricks-datasets/nyctaxi/tables/nyctaxi_yellow")\
     .writeStream\
     .foreachBatch(toto)\
     .trigger(availableNow=True)\
     .start()\
     .awaitTermination()

With a Legacy 15.3 DBR cluster, both prints are displayed.
With a Unity Catalog 15.3 cluster, only the first one is displayed.

But here is what I can find in "Standard error logs" : 

Streaming ForeachBatch worker Started batch 0 with DF id 61cb46fc-3c78-4647-9784-ac01...
Hello 2
Streaming ForeachBatch worker Completed batch 0 with DF id 61cb46fc-3c78-4647-9784-ac01.....
ERROR: Query termination received for [id....

Same behavior for a df.show(2), the result is displayed in error logs.

Any idea why this is happening?

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @MaximeGendre ,

Probably you hit some streaming limitations that apply to Unit Catalog standard access mode. Assuming of course you're using standard access mode 🙂
But one of the limitation they introduce at Databricks Runtime 14.0 and UC cluster is following:

szymon_dybczak_0-1754916964525.png

Compute access mode limitations for Unity Catalog | Databricks Documentation

Which is exactly what you're experiencing. So for Unity Catalog enabled clusters and DBR >= 14.0 print within foreachbatch will write output to driver's log.

View solution in original post

3 REPLIES 3

szymon_dybczak
Esteemed Contributor III

Hi @MaximeGendre ,

Probably you hit some streaming limitations that apply to Unit Catalog standard access mode. Assuming of course you're using standard access mode 🙂
But one of the limitation they introduce at Databricks Runtime 14.0 and UC cluster is following:

szymon_dybczak_0-1754916964525.png

Compute access mode limitations for Unity Catalog | Databricks Documentation

Which is exactly what you're experiencing. So for Unity Catalog enabled clusters and DBR >= 14.0 print within foreachbatch will write output to driver's log.

MaximeGendre
New Contributor III

Hi @szymon_dybczak,
thanks a lot for the quick and accurate answer 🙂

I forgot that there was this limitation.

szymon_dybczak
Esteemed Contributor III

Hi @MaximeGendre ,

No problem, great that it worked for you 🙂

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now