cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

transformWithStateInPandas throws "Spark connect directory is not ready" error

felix4572
New Contributor II

Hello,

we employ arbitrary stateful aggregations in our data processing streams on Azure Databricks, and would like to migrate from applyInPandasWithState to transformWithStateInPandas. We employ the Python API throughout our solution, and some of our workspaces have NOT yet Unity Catalog enabled.

Trying to run the examples provided in the Azure Databricks documentation, e.g., the SCD Type 2 Example, on the workspaces without Unity Catalog enabled, I get the following error: 

felix4572_0-1756710186921.png

The cluster configuration is as follows:

  • DBR 17.1
  • Single node
  • Access mode "No isolation shared"
  • node type ID "Standard_D4ds_v5"
  • Photon not activated

To my understanding, this setup fullfils the requirements for using transformWithStateInPandas (DBR > 16.2, compute using "single user"/"dedicated" or "no isolation shared" access mode, using RocksDB as state store provider).

I also tested other examples, they all result in the same error when trying to start the stream. 

The exact same example with identical cluster configuration works in our Unity-enabled workspaces. 

What did I miss? Why is the spark connect directory not ready on the workspace that has Unity Catalog not enabled?

Best and thanks!

Felix

9 REPLIES 9

-werners-
Esteemed Contributor III

can you share your stream config (write location anonimized etc)?

felix4572
New Contributor II

Dear werners,

thank you for your swift response. I use the notebook provided in the example (with a different storage path, of course). The stream config is included.

Best!

szymon_dybczak
Esteemed Contributor III

Maybe it's a more of a problem with Databricks Connect which is not supported on non UC enabled cluster

Compute configuration for Databricks Connect - Azure Databricks | Microsoft Learn

szymon_dybczak_0-1756718310701.png

 

-werners-
Esteemed Contributor III

https://www.databricks.com/blog/introducing-transformwithstate-apache-sparktm-structured-streaming

Here they specifically mention Unity Catalog clusters (see Availability section), even though in the release notes this is not mentioned as a requirement.  But it could very well be the case since UC is the way to go in the later Databricks releases.

Perhaps someone at Databricks can confirm/deny this?

felix4572
New Contributor II

Dear @szymon_dybczak and @-werners- ,

thank you a lot for for your responses and references! 

@-werners- , thank you for the link to the announcement article. The availability section lists that "No-Isolation and Unity Catalog Dedicated Clusters" are supported. No-isolation access mode is to my understanding not compatible with Unity Catalog. As transformWithStateInPandas supports this access mode, I would assume it can run without Unity Catalog.

This leads me back to the question why the examples are failing in the above-described setup.

I would also be curious on a Databricks reponse on this.

Advika
Databricks Employee
Databricks Employee

Hello @felix4572!

Could you please share the driver log, or even better, the executor log (without any sensitive details)?

Advika
Databricks Employee
Databricks Employee

Update: This is working fine with earlier DBR versions, but the issue seems to occur specifically with DBR 17.1.
I’ve flagged this behaviour with the internal team for further investigation.

szymon_dybczak
Esteemed Contributor III

Thanks @Advika  for update. If you find anything else from internal team, please let us know 😉

felix4572
New Contributor II

Thanks a lot for working on this, @Advika. For now, the workaround to use DBR versions other than 17.1 works for me. Mid-term it would be of course great to use transformWithStateInPandas irrespective of the cluster DBR (as long as the minimum requirements are met).

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now