<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: transformWithStateInPandas throws &amp;quot;Spark connect directory is not ready&amp;quot; error in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130345#M48763</link>
    <description>&lt;P&gt;Maybe it's a more of a problem with Databricks Connect which is not supported on non UC enabled cluster&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect/cluster-config" target="_blank" rel="noopener"&gt;Compute configuration for Databricks Connect - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1756718310701.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19532i0C484DA4305B3112/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1756718310701.png" alt="szymon_dybczak_0-1756718310701.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 01 Sep 2025 09:26:04 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-09-01T09:26:04Z</dc:date>
    <item>
      <title>transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130314#M48751</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;we employ arbitrary stateful aggregations in our data processing streams on Azure Databricks, and would like to migrate from &lt;STRONG&gt;applyInPandasWithState&lt;/STRONG&gt; to &lt;STRONG&gt;transformWithStateInPandas.&lt;/STRONG&gt; We employ the Python API throughout our solution, and some of our workspaces have NOT yet Unity Catalog enabled.&lt;/P&gt;&lt;P&gt;Trying to run the examples provided in the Azure Databricks documentation, e.g., the &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/stateful-applications/examples#slowly-changing-dimension-scd-type-2" target="_self"&gt;SCD Type 2 Example&lt;/A&gt;, on the workspaces without Unity Catalog enabled, I get the following error:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="felix4572_0-1756710186921.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19524i1D6CA9309D1C4E83/image-size/medium?v=v2&amp;amp;px=400" role="button" title="felix4572_0-1756710186921.png" alt="felix4572_0-1756710186921.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The cluster configuration is as follows:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;DBR 17.1&lt;/LI&gt;&lt;LI&gt;Single node&lt;/LI&gt;&lt;LI&gt;Access mode "No isolation shared"&lt;/LI&gt;&lt;LI&gt;node type ID "Standard_D4ds_v5"&lt;/LI&gt;&lt;LI&gt;Photon not activated&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;To my understanding, this setup fullfils the &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/stateful-applications/#requirements" target="_self"&gt;requirements&lt;/A&gt; for using transformWithStateInPandas (DBR &amp;gt; 16.2, compute using "single user"/"dedicated" or "no isolation shared" access mode, using RocksDB as state store provider).&lt;/P&gt;&lt;P&gt;I also tested other examples, they all result in the same error when trying to start the stream.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The exact same example with identical cluster configuration works in our Unity-enabled workspaces.&amp;nbsp;&lt;/P&gt;&lt;P&gt;What did I miss?&amp;nbsp;Why is the spark connect directory not ready on the workspace that has Unity Catalog not enabled?&lt;/P&gt;&lt;P&gt;Best and thanks!&lt;/P&gt;&lt;P&gt;Felix&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 07:20:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130314#M48751</guid>
      <dc:creator>felix4572</dc:creator>
      <dc:date>2025-09-01T07:20:30Z</dc:date>
    </item>
    <item>
      <title>Re: transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130315#M48752</link>
      <description>&lt;P&gt;can you share your stream config (write location anonimized etc)?&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 07:43:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130315#M48752</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2025-09-01T07:43:43Z</dc:date>
    </item>
    <item>
      <title>Re: transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130328#M48756</link>
      <description>&lt;P&gt;Dear werners,&lt;/P&gt;&lt;P&gt;thank you for your swift response. I use the &lt;A href="https://docs.databricks.com/aws/en/notebooks/source/streaming/tws-scd2-python.html" target="_self"&gt;notebook&lt;/A&gt; provided in the example (with a different storage path, of course). The stream config is included.&lt;/P&gt;&lt;P&gt;Best!&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 08:39:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130328#M48756</guid>
      <dc:creator>felix4572</dc:creator>
      <dc:date>2025-09-01T08:39:47Z</dc:date>
    </item>
    <item>
      <title>Re: transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130335#M48761</link>
      <description>&lt;P&gt;&lt;A href="https://www.databricks.com/blog/introducing-transformwithstate-apache-sparktm-structured-streaming" target="_self"&gt;https://www.databricks.com/blog/introducing-transformwithstate-apache-sparktm-structured-streaming&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Here they specifically mention Unity Catalog clusters (see Availability section), even though in the release notes this is not mentioned as a requirement.&amp;nbsp; But it could very well be the case since UC is the way to go in the later Databricks releases.&lt;/P&gt;&lt;P&gt;Perhaps someone at Databricks can confirm/deny this?&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 08:58:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130335#M48761</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2025-09-01T08:58:32Z</dc:date>
    </item>
    <item>
      <title>Re: transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130345#M48763</link>
      <description>&lt;P&gt;Maybe it's a more of a problem with Databricks Connect which is not supported on non UC enabled cluster&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect/cluster-config" target="_blank" rel="noopener"&gt;Compute configuration for Databricks Connect - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1756718310701.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19532i0C484DA4305B3112/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1756718310701.png" alt="szymon_dybczak_0-1756718310701.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 09:26:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130345#M48763</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-01T09:26:04Z</dc:date>
    </item>
    <item>
      <title>Re: transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130349#M48765</link>
      <description>&lt;P&gt;Dear &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;and &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/14792"&gt;@-werners-&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;thank you a lot for for your responses and references!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/14792"&gt;@-werners-&lt;/a&gt;&amp;nbsp;, thank you for the link to the announcement article. The availability section lists that "&lt;SPAN&gt;No-Isolation and Unity Catalog Dedicated Clusters" are supported. No-isolation access mode is to my understanding not compatible with Unity Catalog. As transformWithStateInPandas supports this access mode, I would assume it can run without Unity Catalog.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;This leads me back to the question why the examples are failing in the above-described setup.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I would also be curious on a Databricks reponse on this.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 10:07:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130349#M48765</guid>
      <dc:creator>felix4572</dc:creator>
      <dc:date>2025-09-01T10:07:32Z</dc:date>
    </item>
    <item>
      <title>Re: transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130416#M48785</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/181943"&gt;@felix4572&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;Could you please share the driver log, or even better, the executor log (without any sensitive details)?&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 19:23:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/130416#M48785</guid>
      <dc:creator>Advika</dc:creator>
      <dc:date>2025-09-01T19:23:03Z</dc:date>
    </item>
    <item>
      <title>Re: transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/131238#M49014</link>
      <description>&lt;P&gt;Update: This is working fine with earlier DBR versions, but the issue seems to occur specifically with DBR 17.1.&lt;BR /&gt;I’ve flagged this behaviour with the internal team for further investigation.&lt;/P&gt;</description>
      <pubDate>Mon, 08 Sep 2025 15:09:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/131238#M49014</guid>
      <dc:creator>Advika</dc:creator>
      <dc:date>2025-09-08T15:09:31Z</dc:date>
    </item>
    <item>
      <title>Re: transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/131258#M49023</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/152834"&gt;@Advika&lt;/a&gt;&amp;nbsp; for update. If you find anything else from internal team, please let us know &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Sep 2025 17:36:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/131258#M49023</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-08T17:36:44Z</dc:date>
    </item>
    <item>
      <title>Re: transformWithStateInPandas throws "Spark connect directory is not ready" error</title>
      <link>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/131318#M49042</link>
      <description>&lt;P&gt;Thanks a lot for working on this,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/152834"&gt;@Advika&lt;/a&gt;. For now, the workaround to use DBR versions other than 17.1 works for me. Mid-term it would be of course great to use transformWithStateInPandas irrespective of the cluster DBR (as long as the minimum requirements are met).&lt;/P&gt;</description>
      <pubDate>Tue, 09 Sep 2025 04:48:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/transformwithstateinpandas-throws-quot-spark-connect-directory/m-p/131318#M49042</guid>
      <dc:creator>felix4572</dc:creator>
      <dc:date>2025-09-09T04:48:03Z</dc:date>
    </item>
  </channel>
</rss>

