<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic structured streaming hangs when writing or sometimes reading depends on SINGLE USER or shared mode in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/structured-streaming-hangs-when-writing-or-sometimes-reading/m-p/76771#M35314</link>
    <description>&lt;P&gt;Hi Guys,&lt;/P&gt;&lt;P&gt;I'm new to this community, I am beginning a new project with Azure Databricks and a Python script on my Mac that manipulates data (reading from delta share tables and inserting into local Postgres database ) coming from a remote databricks cluster with &lt;STRONG&gt;single-user&lt;/STRONG&gt; data security mode I'm facing a strange error when writing a stream from a dataframe coming from Azure Databricks to a Postgres table:&lt;/P&gt;&lt;P&gt;First I use databricks-connect==14.2.1 to connect to create a session for our databricks cluster; below is the code snippet:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark = (DatabricksSession.builder.sdkConfig(__config)
                 .remote()
                 .getOrCreate())&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Second, I read from the databricks table using &lt;STRONG&gt;the deltaSharing protocol with the change data feed option&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = (spark.readStream.format("deltaSharing")
              .option("readChangeFeed", "true")
              .load(my_table_path))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thirty, I use the above dataframe to create a writeStream job using the micro-batch feature with the foreachBatch method:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;(df.writeStream
.foreachBatch(process_df18)
.outputMode("update")
.trigger(processingTime="30 seconds")
.option('checkpointLocation', f'{__checkpoint_location}')
.start()
.awaitTermination())

def process_df18(df, batch_id):
# It's not important the method implementation here; the debug breakpoint is not
reached here due to the exception that I will be specified after
pass&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I run the script I always get this error:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;No PYTHON_UID found for the session (a random uuid)&lt;/P&gt;&lt;P&gt;my full dependencies:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;alembic==1.13.1
annotated-types==0.7.0
async-timeout==4.0.3
asyncpg==0.29.0
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
databricks-connect==14.2.1
databricks-sdk==0.28.0
et-xmlfile==1.1.0
google-auth==2.29.0
googleapis-common-protos==1.63.0
greenlet==3.0.3
grpcio==1.64.0
grpcio-status==1.62.2
idna==3.7
kink==0.8.0
loguru==0.7.2
Mako==1.3.5
MarkupSafe==2.1.5
numpy==1.26.4
openpyxl==3.1.3
pandas==2.2.2
protobuf==4.25.3
psycopg2-binary==2.9.9
py4j==0.10.9.7
pyarrow==16.1.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pydantic==2.5.3
pydantic-settings==2.1.0
pydantic_core==2.14.6
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.1
requests==2.32.3
rsa==4.9
six==1.16.0
SQLAlchemy==2.0.25
tqdm==4.66.4
typing_extensions==4.12.1
tzdata==2024.1
urllib3==2.2.1
win32-setctime==1.1.0&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;But&lt;/STRONG&gt; when I switch to &lt;STRONG&gt;shared mode&lt;/STRONG&gt; I get another error:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;[UNSUPPORTED_STREAMING_SOURCE_PERMISSION_ENFORCED] Data source deltaSharing is not supported as a streaming source on a shared cluster. SQLSTATE: 0A000&lt;/P&gt;&lt;P&gt;Versions:&lt;/P&gt;&lt;P&gt;Databricks Runtime: 14.2&lt;/P&gt;&lt;P&gt;Local python installed: 3.10.2&lt;/P&gt;&lt;P&gt;OS: MacOSX 14.0&lt;/P&gt;&lt;P&gt;Any help will be very appreciated and will save my journey.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 04 Jul 2024 15:13:57 GMT</pubDate>
    <dc:creator>maoutir</dc:creator>
    <dc:date>2024-07-04T15:13:57Z</dc:date>
    <item>
      <title>structured streaming hangs when writing or sometimes reading depends on SINGLE USER or shared mode</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-hangs-when-writing-or-sometimes-reading/m-p/76771#M35314</link>
      <description>&lt;P&gt;Hi Guys,&lt;/P&gt;&lt;P&gt;I'm new to this community, I am beginning a new project with Azure Databricks and a Python script on my Mac that manipulates data (reading from delta share tables and inserting into local Postgres database ) coming from a remote databricks cluster with &lt;STRONG&gt;single-user&lt;/STRONG&gt; data security mode I'm facing a strange error when writing a stream from a dataframe coming from Azure Databricks to a Postgres table:&lt;/P&gt;&lt;P&gt;First I use databricks-connect==14.2.1 to connect to create a session for our databricks cluster; below is the code snippet:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark = (DatabricksSession.builder.sdkConfig(__config)
                 .remote()
                 .getOrCreate())&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Second, I read from the databricks table using &lt;STRONG&gt;the deltaSharing protocol with the change data feed option&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = (spark.readStream.format("deltaSharing")
              .option("readChangeFeed", "true")
              .load(my_table_path))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thirty, I use the above dataframe to create a writeStream job using the micro-batch feature with the foreachBatch method:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;(df.writeStream
.foreachBatch(process_df18)
.outputMode("update")
.trigger(processingTime="30 seconds")
.option('checkpointLocation', f'{__checkpoint_location}')
.start()
.awaitTermination())

def process_df18(df, batch_id):
# It's not important the method implementation here; the debug breakpoint is not
reached here due to the exception that I will be specified after
pass&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I run the script I always get this error:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;No PYTHON_UID found for the session (a random uuid)&lt;/P&gt;&lt;P&gt;my full dependencies:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;alembic==1.13.1
annotated-types==0.7.0
async-timeout==4.0.3
asyncpg==0.29.0
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
databricks-connect==14.2.1
databricks-sdk==0.28.0
et-xmlfile==1.1.0
google-auth==2.29.0
googleapis-common-protos==1.63.0
greenlet==3.0.3
grpcio==1.64.0
grpcio-status==1.62.2
idna==3.7
kink==0.8.0
loguru==0.7.2
Mako==1.3.5
MarkupSafe==2.1.5
numpy==1.26.4
openpyxl==3.1.3
pandas==2.2.2
protobuf==4.25.3
psycopg2-binary==2.9.9
py4j==0.10.9.7
pyarrow==16.1.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pydantic==2.5.3
pydantic-settings==2.1.0
pydantic_core==2.14.6
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.1
requests==2.32.3
rsa==4.9
six==1.16.0
SQLAlchemy==2.0.25
tqdm==4.66.4
typing_extensions==4.12.1
tzdata==2024.1
urllib3==2.2.1
win32-setctime==1.1.0&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;But&lt;/STRONG&gt; when I switch to &lt;STRONG&gt;shared mode&lt;/STRONG&gt; I get another error:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;[UNSUPPORTED_STREAMING_SOURCE_PERMISSION_ENFORCED] Data source deltaSharing is not supported as a streaming source on a shared cluster. SQLSTATE: 0A000&lt;/P&gt;&lt;P&gt;Versions:&lt;/P&gt;&lt;P&gt;Databricks Runtime: 14.2&lt;/P&gt;&lt;P&gt;Local python installed: 3.10.2&lt;/P&gt;&lt;P&gt;OS: MacOSX 14.0&lt;/P&gt;&lt;P&gt;Any help will be very appreciated and will save my journey.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 15:13:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-hangs-when-writing-or-sometimes-reading/m-p/76771#M35314</guid>
      <dc:creator>maoutir</dc:creator>
      <dc:date>2024-07-04T15:13:57Z</dc:date>
    </item>
    <item>
      <title>Re: structured streaming hangs when writing or sometimes reading depends on SINGLE USER or shared mo</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-hangs-when-writing-or-sometimes-reading/m-p/77033#M35371</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I reckon I've seen this problem earlier and if I remember correctly there was an issue related to Databricks Connect and single user mode.&lt;BR /&gt;I think your code will work if you use it directly in notebook, but will fail with Databricks Connect.&lt;BR /&gt;About the second error, it's pretty self explanatory - it looks like Delta Sharing is not supported as a streaming source in shared mode cluster.&lt;BR /&gt;&lt;BR /&gt;And since you are using Databricks runtime &amp;gt;= 14.0, make sure to read about behavior changes for foreachBatch on compute configured with shared access mode.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/structured-streaming/foreach.html#behavior-changes-for-foreachbatch-in-databricks-runtime-140" target="_blank" rel="noopener"&gt;Use foreachBatch to write to arbitrary data sinks | Databricks on AWS&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;PS. I managed to find&amp;nbsp; the post with identical case, you can try to double check steps that&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;posted there.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/error-in-spark-streaming-with-foreachbatch-and-databricks/td-p/68843" target="_blank"&gt;Error in Spark Streaming with foreachBatch and Dat... - Databricks Community - 68843&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 07 Jul 2024 08:55:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-hangs-when-writing-or-sometimes-reading/m-p/77033#M35371</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-07-07T08:55:24Z</dc:date>
    </item>
  </channel>
</rss>

