Hi All,
I import following JSON to delta table into VARIANT column:
{
"data": [
{
"group": 1,
"manager": "no",
"firstname": "John",
"lastname": "Smith",
"active": "false",
"team_lead": "yes"
}
],
"page": 1,
"size": 5000,
"timestamp": 1775455219,
"total": 1342,
"totalPages": 1
}
Because of the size I import only data node as VARIANT
To do that I use following logic:
df_transformed = spark.read.format("json")
.option("modifiedAfter", modified_after)
.option("singleVariantColumn", "DATA")
.load(f"{FILE_PATH}")
df_transformed.createOrReplaceTempView("df_transformed")
or streaming version:
df_transformed = (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("singleVariantColumn", "DATA")
.load(f"{FILE_PATH}")
)
df_transformed.createOrReplaceTempView("df_transformed")
and then query:
select * except(DATA) from df_transformed
,LATERAL variant_explode_outer(DATA:data) AS DATA_exploded
worked as expected before 1st of April with runtime 17.3.8
Then DBX upgraded the runtime to version 17.3.9 and streaming version raise following error:
org.apache.spark.sql.AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] Unsupported subquery expression: Correlated column reference 'df_transformed.DATA' cannot be variant type. SQLSTATE: 0A000; line 1 pos 22
spark.read version still works as expected
I tested this with the latest version of 18.0, 18.1 and get the same error
The only version that still works is unsupported 17.2
Probably because it didn't get any update
How to fix this problem?
Am I doing something wrong in this query?