cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error in Spark Streaming with foreachBatch and Databricks Connect

TWib
New Contributor III

The following code throws an error locally in my IDE with Databricks-connect.

 

 

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()
spark.sql("CREATE DATABASE IF NOT EXISTS sample")
spark.sql("DROP TABLE IF EXISTS sample.mvp")
spark.sql("DROP TABLE IF EXISTS sample.mvp_from_foreach_batch")

data = [("John", "Doe", 30), ("Jane", "Doe", 25), ("Mike", "Johnson", 35)]
df = spark.createDataFrame(data, ["FirstName", "LastName", "Age"])
df.write.format("delta").mode("overwrite").saveAsTable("sample.mvp")

def foreach_batch_function(df, epoch_id):
    df.write.format("delta").mode("overwrite").saveAsTable(
        "sample.mvp_from_foreach_batch"
    )

spark.readStream.table("sample.mvp").writeStream.foreachBatch(
    foreach_batch_function
).outputMode("append").trigger(availableNow=True).start().awaitTermination()

 

 

This code only works in notebooks or directly on a cluster. It will not run locally in an IDE with Databricks Connect.

Instead error

pyspark.errors.exceptions.connect.SparkException: No PYTHON_UID found for session (some uid) is raised

In general, Databricks Connect works fine for all other cases.

My local environment:

  • databricks-connect 14.3.1
  • databricks-sdk 0.26.0
  • pyspark 3.5.1
  • Python 3.11.4

Cluster running on

  • 15.1 (includes Apache Spark 3.5.0, Scala 2.12)
  • Single User Mode
6 REPLIES 6

Kaniz
Community Manager
Community Manager

Hi @TWib

Your local environment details look fine but double-check the points above to resolve the error. If you need further assistance, feel free to ask! 

TWib
New Contributor III

Only things differs is Python 3.11.0 on Cluster vs. 3.11.4 locally. This shouldnt be an issue.

Does this code run for you?

 

Kaniz
Community Manager
Community Manager

Hi @TWib, I tried your code in DBR 14.3 LTS ML.

 

Kaniz_0-1716298655146.png

 

TWib
New Contributor III

@Kaniz Notebooks in Databricks Workspace are also working for me (this was never the problem)

Locally in VSCode with DataBricks Connect it fails

TWib
New Contributor III

One more finding: It seems only to occur in single user cluster.

TWib
New Contributor III

This is still unresolved. Internally we have dropped streaming for now because of so many problems, another ticket with support is open.

Currently I do not recommend using streaming with foreach if you want to use databricks connect.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!