cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Upgrading from Spark 2.4 to 3.2: Recursive view errors when using

RasmusOlesen
New Contributor III

We get errors like this,

Recursive view `x` detected (cycle: `x` -> `x`)

.. in our long-term working code, that has worked just fine in Spark 2.4.5 (Runtime 6.4), when we run it on a Spark 3.2 cluster (Runtime 10.0).

It happens whenever we have,

<x is a Dataframe>

x.createOrReplaceTempView('x')

... in order to use Spark SQL such as,

y = spark.sql("""

select ...

from x

...

""")

-------

It seems that the ('x') name of the

x.createOrReplaceTempView('x')

... needs to be globally unique and cannot be overwritten (even if the function "createOrReplaceTempView" has the word "replace" in it).

Is there a fix for this issue in general without setting the global "spark.<something>.legacy" option? (We prefer to avoid that)

If there is no fix, our current cumbersome re-write would be to rewrite every

x.createOrReplaceTempView('x')

to

z = f'x_{timestamp_as_text_with_underscores}'

x.createOrReplaceTempView(z)

... and then ofc use it as,

y = spark.sql("""

select ...

from z

...

""")

... such, but we would prefer a more elegant solution, if there is one?

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

strange it should work as it is. Only other idea which I have is to check createOrReplaceGlobalTempView to use global database. createOrReplaceTempView is using "default" database maybe there were some issues with that during migration? You can create new spark database.

below functions can be also useful to diagnose problem:

spark.catalog.listDatabases() 
spark.catalog.currentDatabase() 
spark.catalog.listTables('default')

RasmusOlesen
New Contributor III

Thanks a lot Hubert,

Could it be a bug where Spark 3.2's createOrReplaceTempView is accidentally calling/using the global database and not the/a "default" one?

We implemented a temp makeshift system where the names we register are unique via timestamps in the names and then used these dynamic names in the Spark SQL.

Again thanks for the expressions to check the registered names in the DB's!

RobinL
New Contributor II

I'm experiencing the same problem on Spark 3.2.0 (not databricks). This was the only other reference to this problem I've found online. I've raised an issue in the main Spark project here: https://issues.apache.org/jira/browse/SPARK-37690. There's a reproducible example that works in 2.x and 3.2.1 so I'm fairly confident this is an bug introduced in 3.2.0

arkrish
New Contributor II

This is a breaking change introduced in Spark 3.1

From Migration Guide: SQL, Datasets and DataFrame - Spark 3.1.1 Documentation (apache.org)

In Spark 3.1, the temporary view will have same behaviors with the permanent view, i.e. capture and store runtime SQL configs, SQL text, catalog and namespace. The capatured view properties will be applied during the parsing and analysis phases of the view resolution. To restore the behavior before Spark 3.1, you can set spark.sql.legacy.storeAnalyzedPlanForView to true.

I've tried setting spark.sql.legacy.storeAnalyzedPlanForView to true and was able to restore the old behaviour.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group