emma_s
Databricks Employee
Databricks Employee

Hi, 

I've been testing this on a workspace at my end and see exactly the same thing. I'd first recommend raising a support ticket for this. 

In the meantime you can use the following workaround:

I reproduced it on DBR 18.0 using readStream + cloudFiles + singleVariantColumn - the exact error you're seeing:

[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
Correlated column reference 'DATA' cannot be variant type.

It only affects streaming. The same query works fine in batch with spark.read. I'd recommend raising a support ticket as this could be a regression from 17.3.8.

Workaround (tested on DBR 18.0 in streaming):

Convert the variant array to a typed array with from_json, then use standard explode_outer:


This casts the variant to a string, parses it into a typed array, and uses explode_outer instead of variant_explode_outer. It avoids the correlated variant reference that
triggers the error.

SELECT record.*
FROM (
SELECT explode_outer(
from_json(
DATA:data::STRING,
'array<struct<group:int, manager:string, firstname:string, lastname:string, active:string, team_lead:string>>'
)
) AS record
FROM df_transformed
) t


The trade-off is you need to define the struct schema in the from_json call, which you don't with variant_explode_outer. But it works in both batch and streaming.

View solution in original post