Databricks Community

norbitek · ‎04-08-2026

Hi All,

I import following JSON to delta table into VARIANT column:

{
    "data": [
        {
            "group": 1,
            "manager": "no",
            "firstname": "John",
            "lastname": "Smith",
            "active": "false",
            "team_lead": "yes"
        }
    ],
    "page": 1,
    "size": 5000,
    "timestamp": 1775455219,
    "total": 1342,
    "totalPages": 1
}

Because of the size I import only data node as VARIANT

To do that I use following logic:

df_transformed = spark.read.format("json")
    .option("modifiedAfter", modified_after)
    .option("singleVariantColumn", "DATA")
    .load(f"{FILE_PATH}")

df_transformed.createOrReplaceTempView("df_transformed")

or streaming version:

df_transformed = (
    spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("singleVariantColumn", "DATA")
    .load(f"{FILE_PATH}")
)

df_transformed.createOrReplaceTempView("df_transformed")

and then query:

select * except(DATA) from df_transformed
,LATERAL variant_explode_outer(DATA:data) AS DATA_exploded

worked as expected before 1st of April with runtime 17.3.8

Then DBX upgraded the runtime to version 17.3.9 and streaming version raise following error:

org.apache.spark.sql.AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] Unsupported subquery expression: Correlated column reference 'df_transformed.DATA' cannot be variant type. SQLSTATE: 0A000; line 1 pos 22

spark.read version still works as expected

I tested this with the latest version of 18.0, 18.1 and get the same error

The only version that still works is unsupported 17.2

Probably because it didn't get any update

How to fix this problem?

Am I doing something wrong in this query?

emma_s · ‎04-08-2026

Hi,

I've been testing this on a workspace at my end and see exactly the same thing. I'd first recommend raising a support ticket for this.

In the meantime you can use the following workaround:

I reproduced it on DBR 18.0 using readStream + cloudFiles + singleVariantColumn - the exact error you're seeing:

[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
Correlated column reference 'DATA' cannot be variant type.

It only affects streaming. The same query works fine in batch with spark.read. I'd recommend raising a support ticket as this could be a regression from 17.3.8.

Workaround (tested on DBR 18.0 in streaming):

Convert the variant array to a typed array with from_json, then use standard explode_outer:

This casts the variant to a string, parses it into a typed array, and uses explode_outer instead of variant_explode_outer. It avoids the correlated variant reference that
triggers the error.

SELECT record.*
FROM (
SELECT explode_outer(
from_json(
DATA:data::STRING,
'array<struct<group:int, manager:string, firstname:string, lastname:string, active:string, team_lead:string>>'
)
) AS record
FROM df_transformed
) t

The trade-off is you need to define the struct schema in the from_json call, which you don't with variant_explode_outer. But it works in both batch and streaming.

View solution in original post

emma_s · ‎04-08-2026

Hi,

I've been testing this on a workspace at my end and see exactly the same thing. I'd first recommend raising a support ticket for this.

In the meantime you can use the following workaround:

I reproduced it on DBR 18.0 using readStream + cloudFiles + singleVariantColumn - the exact error you're seeing:

[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
Correlated column reference 'DATA' cannot be variant type.

It only affects streaming. The same query works fine in batch with spark.read. I'd recommend raising a support ticket as this could be a regression from 17.3.8.

Workaround (tested on DBR 18.0 in streaming):

Convert the variant array to a typed array with from_json, then use standard explode_outer:

This casts the variant to a string, parses it into a typed array, and uses explode_outer instead of variant_explode_outer. It avoids the correlated variant reference that
triggers the error.

SELECT record.*
FROM (
SELECT explode_outer(
from_json(
DATA:data::STRING,
'array<struct<group:int, manager:string, firstname:string, lastname:string, active:string, team_lead:string>>'
)
) AS record
FROM df_transformed
) t

The trade-off is you need to define the struct schema in the from_json call, which you don't with variant_explode_outer. But it works in both batch and streaming.

Databricks Community

variant_explode_outer stop working after the last DBX runtime patch

DAIS 2026 Day 3 - Last Day, Make It Count

DAIS 2026 Day 2 Recap: Megaphones, Legos, and the volunteers who carried it 🙌

🌟 Community Pulse: Your Weekly Roundup! June 08 – 14, 2026

Solution Accelerator Series | Building a Chatbot With Large Language Models (LLMs)

Build apps without jumping through hoops