topic Re: transformWithStateInPandas throws "Spark connect directory is not ready" error in Data Engineering

transformWithStateInPandas throws "Spark connect directory is not ready" error

felix4572 — Mon, 01 Sep 2025 07:20:30 GMT

Hello,

we employ arbitrary stateful aggregations in our data processing streams on Azure Databricks, and would like to migrate from applyInPandasWithState to transformWithStateInPandas. We employ the Python API throughout our solution, and some of our workspaces have NOT yet Unity Catalog enabled.

Trying to run the examples provided in the Azure Databricks documentation, e.g., the SCD Type 2 Example, on the workspaces without Unity Catalog enabled, I get the following error:

The cluster configuration is as follows:

DBR 17.1
Single node
Access mode "No isolation shared"
node type ID "Standard_D4ds_v5"
Photon not activated

To my understanding, this setup fullfils the requirements for using transformWithStateInPandas (DBR > 16.2, compute using "single user"/"dedicated" or "no isolation shared" access mode, using RocksDB as state store provider).

I also tested other examples, they all result in the same error when trying to start the stream.

The exact same example with identical cluster configuration works in our Unity-enabled workspaces.

What did I miss? Why is the spark connect directory not ready on the workspace that has Unity Catalog not enabled?

Best and thanks!

Felix

Re: transformWithStateInPandas throws "Spark connect directory is not ready" error

-werners- — Mon, 01 Sep 2025 07:43:43 GMT

can you share your stream config (write location anonimized etc)?

Re: transformWithStateInPandas throws "Spark connect directory is not ready" error

felix4572 — Mon, 01 Sep 2025 08:39:47 GMT

Dear werners,

thank you for your swift response. I use the notebook provided in the example (with a different storage path, of course). The stream config is included.

Best!

Re: transformWithStateInPandas throws "Spark connect directory is not ready" error

-werners- — Mon, 01 Sep 2025 08:58:32 GMT

https://www.databricks.com/blog/introducing-transformwithstate-apache-sparktm-structured-streaming

Here they specifically mention Unity Catalog clusters (see Availability section), even though in the release notes this is not mentioned as a requirement. But it could very well be the case since UC is the way to go in the later Databricks releases.

Perhaps someone at Databricks can confirm/deny this?

Re: transformWithStateInPandas throws "Spark connect directory is not ready" error

szymon_dybczak — Mon, 01 Sep 2025 09:26:04 GMT

Maybe it's a more of a problem with Databricks Connect which is not supported on non UC enabled cluster

Compute configuration for Databricks Connect - Azure Databricks | Microsoft Learn

Re: transformWithStateInPandas throws "Spark connect directory is not ready" error

felix4572 — Mon, 01 Sep 2025 10:07:32 GMT

Dear @szymon_dybczak and @-werners- ,

thank you a lot for for your responses and references!

@-werners- , thank you for the link to the announcement article. The availability section lists that "No-Isolation and Unity Catalog Dedicated Clusters" are supported. No-isolation access mode is to my understanding not compatible with Unity Catalog. As transformWithStateInPandas supports this access mode, I would assume it can run without Unity Catalog.

This leads me back to the question why the examples are failing in the above-described setup.

I would also be curious on a Databricks reponse on this.

Re: transformWithStateInPandas throws "Spark connect directory is not ready" error

Advika — Mon, 01 Sep 2025 19:23:03 GMT

Hello @felix4572!

Could you please share the driver log, or even better, the executor log (without any sensitive details)?

Re: transformWithStateInPandas throws "Spark connect directory is not ready" error

Advika — Mon, 08 Sep 2025 15:09:31 GMT

Update: This is working fine with earlier DBR versions, but the issue seems to occur specifically with DBR 17.1.
I’ve flagged this behaviour with the internal team for further investigation.

Re: transformWithStateInPandas throws "Spark connect directory is not ready" error

szymon_dybczak — Mon, 08 Sep 2025 17:36:44 GMT

Thanks @Advika for update. If you find anything else from internal team, please let us know 😉

Re: transformWithStateInPandas throws "Spark connect directory is not ready" error

felix4572 — Tue, 09 Sep 2025 04:48:03 GMT

Thanks a lot for working on this, @Advika. For now, the workaround to use DBR versions other than 17.1 works for me. Mid-term it would be of course great to use transformWithStateInPandas irrespective of the cluster DBR (as long as the minimum requirements are met).