Databricks Community

diego_poggioli · ‎06-17-2024

I'm trying to perform a merge inside a streaming foreachbatch using the command:

microBatchDF._jdf.sparkSession().sql(self.merge_query)

Streaming runs fine if I use a Personal cluster while if I use a Shared cluster streaming fails with the following error:

org.apache.spark.api.python.PythonException: Found error inside foreachBatch Python process: Traceback (most recent call last):

pyspark.errors.exceptions.base.PySparkAttributeError: [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jdf` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.

Any idea?

Thanks