- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2026 12:43 AM - edited 04-08-2026 12:45 AM
Hi All,
I import following JSON to delta table into VARIANT column:
{
"data": [
{
"group": 1,
"manager": "no",
"firstname": "John",
"lastname": "Smith",
"active": "false",
"team_lead": "yes"
}
],
"page": 1,
"size": 5000,
"timestamp": 1775455219,
"total": 1342,
"totalPages": 1
}Because of the size I import only data node as VARIANT
To do that I use following logic:
df_transformed = spark.read.format("json")
.option("modifiedAfter", modified_after)
.option("singleVariantColumn", "DATA")
.load(f"{FILE_PATH}")
df_transformed.createOrReplaceTempView("df_transformed")or streaming version:
df_transformed = (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("singleVariantColumn", "DATA")
.load(f"{FILE_PATH}")
)
df_transformed.createOrReplaceTempView("df_transformed")and then query:
select * except(DATA) from df_transformed
,LATERAL variant_explode_outer(DATA:data) AS DATA_explodedworked as expected before 1st of April with runtime 17.3.8
Then DBX upgraded the runtime to version 17.3.9 and streaming version raise following error:
org.apache.spark.sql.AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] Unsupported subquery expression: Correlated column reference 'df_transformed.DATA' cannot be variant type. SQLSTATE: 0A000; line 1 pos 22
spark.read version still works as expected
I tested this with the latest version of 18.0, 18.1 and get the same error
The only version that still works is unsupported 17.2
Probably because it didn't get any update
How to fix this problem?
Am I doing something wrong in this query?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2026 09:06 AM
Hi,
I've been testing this on a workspace at my end and see exactly the same thing. I'd first recommend raising a support ticket for this.
In the meantime you can use the following workaround:
I reproduced it on DBR 18.0 using readStream + cloudFiles + singleVariantColumn - the exact error you're seeing:
[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
Correlated column reference 'DATA' cannot be variant type.
It only affects streaming. The same query works fine in batch with spark.read. I'd recommend raising a support ticket as this could be a regression from 17.3.8.
Workaround (tested on DBR 18.0 in streaming):
Convert the variant array to a typed array with from_json, then use standard explode_outer:
This casts the variant to a string, parses it into a typed array, and uses explode_outer instead of variant_explode_outer. It avoids the correlated variant reference that
triggers the error.
SELECT record.*
FROM (
SELECT explode_outer(
from_json(
DATA:data::STRING,
'array<struct<group:int, manager:string, firstname:string, lastname:string, active:string, team_lead:string>>'
)
) AS record
FROM df_transformed
) t
The trade-off is you need to define the struct schema in the from_json call, which you don't with variant_explode_outer. But it works in both batch and streaming.