Structure stream : difference Unity Catalog vs Legacy

MaximeGendre
New Contributor III

Hello :),
I have noticed a regression in one of my job and I don't understand why.

%python
print("Hello 1")

def toto(df, _):
    print("Hello 2")

spark.readStream\
     .format("delta")\
      .load("/databricks-datasets/nyctaxi/tables/nyctaxi_yellow")\
     .writeStream\
     .foreachBatch(toto)\
     .trigger(availableNow=True)\
     .start()\
     .awaitTermination()

With a Legacy 15.3 DBR cluster, both prints are displayed.
With a Unity Catalog 15.3 cluster, only the first one is displayed.

But here is what I can find in "Standard error logs" : 

Streaming ForeachBatch worker Started batch 0 with DF id 61cb46fc-3c78-4647-9784-ac01...
Hello 2
Streaming ForeachBatch worker Completed batch 0 with DF id 61cb46fc-3c78-4647-9784-ac01.....
ERROR: Query termination received for [id....

Same behavior for a df.show(2), the result is displayed in error logs.

Any idea why this is happening?

Thanks

 

szymon_dybczak
Esteemed Contributor III

Hi @MaximeGendre ,

Probably you hit some streaming limitations that apply to Unit Catalog standard access mode. Assuming of course you're using standard access mode 🙂
But one of the limitation they introduce at Databricks Runtime 14.0 and UC cluster is following:

szymon_dybczak_0-1754916964525.png

Compute access mode limitations for Unity Catalog | Databricks Documentation

Which is exactly what you're experiencing. So for Unity Catalog enabled clusters and DBR >= 14.0 print within foreachbatch will write output to driver's log.

View solution in original post

MaximeGendre
New Contributor III

Hi @szymon_dybczak,
thanks a lot for the quick and accurate answer 🙂

I forgot that there was this limitation.

szymon_dybczak
Esteemed Contributor III

Hi @MaximeGendre ,

No problem, great that it worked for you 🙂