cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Upgrading from Spark 2.4 to 3.2: Recursive view errors when using

RasmusOlesen
New Contributor III

We get errors like this,

Recursive view `x` detected (cycle: `x` -> `x`)

.. in our long-term working code, that has worked just fine in Spark 2.4.5 (Runtime 6.4), when we run it on a Spark 3.2 cluster (Runtime 10.0).

It happens whenever we have,

<x is a Dataframe>

x.createOrReplaceTempView('x')

... in order to use Spark SQL such as,

y = spark.sql("""

select ...

from x

...

""")

-------

It seems that the ('x') name of the

x.createOrReplaceTempView('x')

... needs to be globally unique and cannot be overwritten (even if the function "createOrReplaceTempView" has the word "replace" in it).

Is there a fix for this issue in general without setting the global "spark.<something>.legacy" option? (We prefer to avoid that)

If there is no fix, our current cumbersome re-write would be to rewrite every

x.createOrReplaceTempView('x')

to

z = f'x_{timestamp_as_text_with_underscores}'

x.createOrReplaceTempView(z)

... and then ofc use it as,

y = spark.sql("""

select ...

from z

...

""")

... such, but we would prefer a more elegant solution, if there is one?

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

strange it should work as it is. Only other idea which I have is to check createOrReplaceGlobalTempView to use global database. createOrReplaceTempView is using "default" database maybe there were some issues with that during migration? You can create new spark database.

below functions can be also useful to diagnose problem:

spark.catalog.listDatabases() 
spark.catalog.currentDatabase() 
spark.catalog.listTables('default')

RasmusOlesen
New Contributor III

Thanks a lot Hubert,

Could it be a bug where Spark 3.2's createOrReplaceTempView is accidentally calling/using the global database and not the/a "default" one?

We implemented a temp makeshift system where the names we register are unique via timestamps in the names and then used these dynamic names in the Spark SQL.

Again thanks for the expressions to check the registered names in the DB's!

RobinL
New Contributor II

I'm experiencing the same problem on Spark 3.2.0 (not databricks). This was the only other reference to this problem I've found online. I've raised an issue in the main Spark project here: https://issues.apache.org/jira/browse/SPARK-37690. There's a reproducible example that works in 2.x and 3.2.1 so I'm fairly confident this is an bug introduced in 3.2.0

arkrish
New Contributor II

This is a breaking change introduced in Spark 3.1

From Migration Guide: SQL, Datasets and DataFrame - Spark 3.1.1 Documentation (apache.org)

In Spark 3.1, the temporary view will have same behaviors with the permanent view, i.e. capture and store runtime SQL configs, SQL text, catalog and namespace. The capatured view properties will be applied during the parsing and analysis phases of the view resolution. To restore the behavior before Spark 3.1, you can set spark.sql.legacy.storeAnalyzedPlanForView to true.

I've tried setting spark.sql.legacy.storeAnalyzedPlanForView to true and was able to restore the old behaviour.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.