Databricks

RasmusOlesen · ‎10-26-2021

We get errors like this,

Recursive view `x` detected (cycle: `x` -> `x`)

.. in our long-term working code, that has worked just fine in Spark 2.4.5 (Runtime 6.4), when we run it on a Spark 3.2 cluster (Runtime 10.0).

It happens whenever we have,

x.createOrReplaceTempView('x')

... in order to use Spark SQL such as,

y = spark.sql("""

select ...

from x

...

""")

-------

It seems that the ('x') name of the

x.createOrReplaceTempView('x')

... needs to be globally unique and cannot be overwritten (even if the function "createOrReplaceTempView" has the word "replace" in it).

Is there a fix for this issue in general without setting the global "spark.<something>.legacy" option? (We prefer to avoid that)

If there is no fix, our current cumbersome re-write would be to rewrite every

x.createOrReplaceTempView('x')

to

z = f'x_{timestamp_as_text_with_underscores}'

x.createOrReplaceTempView(z)

... and then ofc use it as,

y = spark.sql("""

select ...

from z

...

""")

... such, but we would prefer a more elegant solution, if there is one?

Hubert-Dudek · ‎10-26-2021

strange it should work as it is. Only other idea which I have is to check createOrReplaceGlobalTempView to use global database. createOrReplaceTempView is using "default" database maybe there were some issues with that during migration? You can create new spark database.

below functions can be also useful to diagnose problem:

spark.catalog.listDatabases() 
spark.catalog.currentDatabase() 
spark.catalog.listTables('default')

RasmusOlesen · ‎10-27-2021

Thanks a lot Hubert,

Could it be a bug where Spark 3.2's createOrReplaceTempView is accidentally calling/using the global database and not the/a "default" one?

We implemented a temp makeshift system where the names we register are unique via timestamps in the names and then used these dynamic names in the Spark SQL.

Again thanks for the expressions to check the registered names in the DB's!

RobinL · ‎12-19-2021

I'm experiencing the same problem on Spark 3.2.0 (not databricks). This was the only other reference to this problem I've found online. I've raised an issue in the main Spark project here: https://issues.apache.org/jira/browse/SPARK-37690. There's a reproducible example that works in 2.x and 3.2.1 so I'm fairly confident this is an bug introduced in 3.2.0

arkrish · ‎12-22-2021

This is a breaking change introduced in Spark 3.1

From Migration Guide: SQL, Datasets and DataFrame - Spark 3.1.1 Documentation (apache.org)

In Spark 3.1, the temporary view will have same behaviors with the permanent view, i.e. capture and store runtime SQL configs, SQL text, catalog and namespace. The capatured view properties will be applied during the parsing and analysis phases of the view resolution. To restore the behavior before Spark 3.1, you can set spark.sql.legacy.storeAnalyzedPlanForView to true.

I've tried setting spark.sql.legacy.storeAnalyzedPlanForView to true and was able to restore the old behaviour.

Databricks

Upgrading from Spark 2.4 to 3.2: Recursive view errors when using

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs