variant_explode_outer stop working after the last DBX runtime patch

norbitek
New Contributor II

Hi All,

I import following JSON to delta table into VARIANT column:

{
    "data": [
        {
            "group": 1,
            "manager": "no",
            "firstname": "John",
            "lastname": "Smith",
            "active": "false",
            "team_lead": "yes"
        }
    ],
    "page": 1,
    "size": 5000,
    "timestamp": 1775455219,
    "total": 1342,
    "totalPages": 1
}

Because of the size I import only data node as VARIANT

To do that I use following logic:

df_transformed = spark.read.format("json")
    .option("modifiedAfter", modified_after)
    .option("singleVariantColumn", "DATA")
    .load(f"{FILE_PATH}")

df_transformed.createOrReplaceTempView("df_transformed")

or streaming version:

df_transformed = (
    spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("singleVariantColumn", "DATA")
    .load(f"{FILE_PATH}")
)

df_transformed.createOrReplaceTempView("df_transformed")

and then query:

select * except(DATA) from df_transformed
,LATERAL variant_explode_outer(DATA:data) AS DATA_exploded

worked as expected before 1st of April with runtime 17.3.8

Then DBX upgraded the runtime to version 17.3.9 and streaming version raise following error:

org.apache.spark.sql.AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] Unsupported subquery expression: Correlated column reference 'df_transformed.DATA' cannot be variant type. SQLSTATE: 0A000; line 1 pos 22

spark.read version still works as expected

I tested this with the latest version of 18.0, 18.1 and get the same error

The only version that still works is unsupported 17.2

Probably because it didn't get any update

How to fix this problem?

Am I doing something wrong in this query?