cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

structured streaming hangs when writing or sometimes reading depends on SINGLE USER or shared mode

maoutir
New Contributor

Hi Guys,

I'm new to this community, I am beginning a new project with Azure Databricks and a Python script on my Mac that manipulates data (reading from delta share tables and inserting into local Postgres database ) coming from a remote databricks cluster with single-user data security mode I'm facing a strange error when writing a stream from a dataframe coming from Azure Databricks to a Postgres table:

First I use databricks-connect==14.2.1 to connect to create a session for our databricks cluster; below is the code snippet:

 

 

 

 

spark = (DatabricksSession.builder.sdkConfig(__config)
                 .remote()
                 .getOrCreate())

 

 

 

 

Second, I read from the databricks table using the deltaSharing protocol with the change data feed option

 

 

 

 

df = (spark.readStream.format("deltaSharing")
              .option("readChangeFeed", "true")
              .load(my_table_path))

 

 

 

 

Thirty, I use the above dataframe to create a writeStream job using the micro-batch feature with the foreachBatch method:

 

 

 

 

(df.writeStream
.foreachBatch(process_df18)
.outputMode("update")
.trigger(processingTime="30 seconds")
.option('checkpointLocation', f'{__checkpoint_location}')
.start()
.awaitTermination())

def process_df18(df, batch_id):
# It's not important the method implementation here; the debug breakpoint is not
reached here due to the exception that I will be specified after
pass

 

 

 

 

When I run the script I always get this error:

No PYTHON_UID found for the session (a random uuid)

my full dependencies:

 

 

 

 

alembic==1.13.1
annotated-types==0.7.0
async-timeout==4.0.3
asyncpg==0.29.0
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
databricks-connect==14.2.1
databricks-sdk==0.28.0
et-xmlfile==1.1.0
google-auth==2.29.0
googleapis-common-protos==1.63.0
greenlet==3.0.3
grpcio==1.64.0
grpcio-status==1.62.2
idna==3.7
kink==0.8.0
loguru==0.7.2
Mako==1.3.5
MarkupSafe==2.1.5
numpy==1.26.4
openpyxl==3.1.3
pandas==2.2.2
protobuf==4.25.3
psycopg2-binary==2.9.9
py4j==0.10.9.7
pyarrow==16.1.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pydantic==2.5.3
pydantic-settings==2.1.0
pydantic_core==2.14.6
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.1
requests==2.32.3
rsa==4.9
six==1.16.0
SQLAlchemy==2.0.25
tqdm==4.66.4
typing_extensions==4.12.1
tzdata==2024.1
urllib3==2.2.1
win32-setctime==1.1.0

 

 

 

 

But when I switch to shared mode I get another error:

[UNSUPPORTED_STREAMING_SOURCE_PERMISSION_ENFORCED] Data source deltaSharing is not supported as a streaming source on a shared cluster. SQLSTATE: 0A000

Versions:

Databricks Runtime: 14.2

Local python installed: 3.10.2

OS: MacOSX 14.0

Any help will be very appreciated and will save my journey.

Thanks.



1 REPLY 1

Slash
New Contributor

Hi,

I reckon I've seen this problem earlier and if I remember correctly there was an issue related to Databricks Connect and single user mode.
I think your code will work if you use it directly in notebook, but will fail with Databricks Connect.
About the second error, it's pretty self explanatory - it looks like Delta Sharing is not supported as a streaming source in shared mode cluster.

And since you are using Databricks runtime >= 14.0, make sure to read about behavior changes for foreachBatch on compute configured with shared access mode.

Use foreachBatch to write to arbitrary data sinks | Databricks on AWS

PS. I managed to find  the post with identical case, you can try to double check steps that @Kaniz_Fatma posted there.

Error in Spark Streaming with foreachBatch and Dat... - Databricks Community - 68843

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!