Streaming foreachBatch _jdf jvm attribute not supported

diego_poggioli — Mon, 17 Jun 2024 09:00:40 GMT

I'm trying to perform a merge inside a streaming foreachbatch using the command:

microBatchDF._jdf.sparkSession().sql(self.merge_query)

Streaming runs fine if I use a Personal cluster while if I use a Shared cluster streaming fails with the following error:

org.apache.spark.api.python.PythonException: Found error inside foreachBatch Python process: Traceback (most recent call last):

pyspark.errors.exceptions.base.PySparkAttributeError: [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jdf` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.

Any idea?

Thanks

Re: Streaming foreachBatch _jdf jvm attribute not supported

holly — Thu, 20 Jun 2024 14:46:53 GMT

Can you share what runtime your cluster is using?

This error doesn't surprise me, Unity Catalog Shared clusters have many security limitations, but the list is reducing over time. https://docs.databricks.com/en/compute/access-mode-limitations.html#shared-access-mode-limitations-on-unity-catalog

topic Streaming foreachBatch _jdf jvm attribute not supported in Data Engineering

Streaming foreachBatch _jdf jvm attribute not supported

Re: Streaming foreachBatch _jdf jvm attribute not supported