topic Re: Upgrading from Spark 2.4 to 3.2: Recursive view errors when using in Data Engineering

Upgrading from Spark 2.4 to 3.2: Recursive view errors when using

RasmusOlesen — Tue, 26 Oct 2021 13:10:10 GMT

We get errors like this,

Recursive view `x` detected (cycle: `x` -> `x`)

.. in our long-term working code, that has worked just fine in Spark 2.4.5 (Runtime 6.4), when we run it on a Spark 3.2 cluster (Runtime 10.0).

It happens whenever we have,

x.createOrReplaceTempView('x')

... in order to use Spark SQL such as,

y = spark.sql("""

select ...

from x

...

""")

-------

It seems that the ('x') name of the

x.createOrReplaceTempView('x')

... needs to be globally unique and cannot be overwritten (even if the function "createOrReplaceTempView" has the word "replace" in it).

Is there a fix for this issue in general without setting the global "spark.<something>.legacy" option? (We prefer to avoid that)

If there is no fix, our current cumbersome re-write would be to rewrite every

x.createOrReplaceTempView('x')

z = f'x_{timestamp_as_text_with_underscores}'

x.createOrReplaceTempView(z)

... and then ofc use it as,

y = spark.sql("""

select ...

from z

...

""")

... such, but we would prefer a more elegant solution, if there is one?

Re: Upgrading from Spark 2.4 to 3.2: Recursive view errors when using

Hubert-Dudek — Tue, 26 Oct 2021 13:28:03 GMT

strange it should work as it is. Only other idea which I have is to check createOrReplaceGlobalTempView to use global database. createOrReplaceTempView is using "default" database maybe there were some issues with that during migration? You can create new spark database.

below functions can be also useful to diagnose problem:

spark.catalog.listDatabases() 
spark.catalog.currentDatabase() 
spark.catalog.listTables('default')

Re: Upgrading from Spark 2.4 to 3.2: Recursive view errors when using

RasmusOlesen — Wed, 27 Oct 2021 09:40:27 GMT

Thanks a lot Hubert,

Could it be a bug where Spark 3.2's createOrReplaceTempView is accidentally calling/using the global database and not the/a "default" one?

We implemented a temp makeshift system where the names we register are unique via timestamps in the names and then used these dynamic names in the Spark SQL.

Again thanks for the expressions to check the registered names in the DB's!

Re: Upgrading from Spark 2.4 to 3.2: Recursive view errors when using

RobinL — Mon, 20 Dec 2021 07:54:11 GMT

I'm experiencing the same problem on Spark 3.2.0 (not databricks). This was the only other reference to this problem I've found online. I've raised an issue in the main Spark project here: https://issues.apache.org/jira/browse/SPARK-37690. There's a reproducible example that works in 2.x and 3.2.1 so I'm fairly confident this is an bug introduced in 3.2.0

Re: Upgrading from Spark 2.4 to 3.2: Recursive view errors when using

arkrish — Wed, 22 Dec 2021 14:32:37 GMT

This is a breaking change introduced in Spark 3.1

From Migration Guide: SQL, Datasets and DataFrame - Spark 3.1.1 Documentation (apache.org)

In Spark 3.1, the temporary view will have same behaviors with the permanent view, i.e. capture and store runtime SQL configs, SQL text, catalog and namespace. The capatured view properties will be applied during the parsing and analysis phases of the view resolution. To restore the behavior before Spark 3.1, you can set spark.sql.legacy.storeAnalyzedPlanForView to true.

I've tried setting spark.sql.legacy.storeAnalyzedPlanForView to true and was able to restore the old behaviour.