<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic variant_explode_outer stop working after the last DBX runtime patch in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/variant-explode-outer-stop-working-after-the-last-dbx-runtime/m-p/153678#M53995</link>
    <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I import following JSON to delta table into VARIANT column:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{
    "data": [
        {
            "group": 1,
            "manager": "no",
            "firstname": "John",
            "lastname": "Smith",
            "active": "false",
            "team_lead": "yes"
        }
    ],
    "page": 1,
    "size": 5000,
    "timestamp": 1775455219,
    "total": 1342,
    "totalPages": 1
}&lt;/LI-CODE&gt;&lt;P&gt;Because of the size I import only data node as VARIANT&lt;/P&gt;&lt;P&gt;To do that I use following logic:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;df_transformed = spark.read.format("json")
    .option("modifiedAfter", modified_after)
    .option("singleVariantColumn", "DATA")
    .load(f"{FILE_PATH}")

df_transformed.createOrReplaceTempView("df_transformed")&lt;/LI-CODE&gt;&lt;P&gt;or streaming version:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;df_transformed = (
    spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("singleVariantColumn", "DATA")
    .load(f"{FILE_PATH}")
)

df_transformed.createOrReplaceTempView("df_transformed")&lt;/LI-CODE&gt;&lt;P&gt;and then query:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select * except(DATA) from df_transformed
,LATERAL variant_explode_outer(DATA:data) AS DATA_exploded&lt;/LI-CODE&gt;&lt;P&gt;worked as expected before 1st of April with runtime 17.3.8&lt;/P&gt;&lt;P&gt;Then DBX upgraded the&amp;nbsp;runtime to version 17.3.9 and streaming version raise following error:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;org.apache.spark.sql.AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] Unsupported subquery expression: Correlated column reference 'df_transformed.DATA' cannot be variant type. SQLSTATE: 0A000; line 1 pos 22&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;spark.read version still works as expected&lt;/P&gt;&lt;P&gt;I tested this with the latest version of 18.0, 18.1 and get the same error&lt;/P&gt;&lt;P&gt;The only version that still works is unsupported 17.2&lt;/P&gt;&lt;P&gt;Probably because it didn't get any update&lt;/P&gt;&lt;P&gt;How to fix this problem?&lt;/P&gt;&lt;P&gt;Am I doing something wrong in this query?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 08 Apr 2026 07:45:56 GMT</pubDate>
    <dc:creator>norbitek</dc:creator>
    <dc:date>2026-04-08T07:45:56Z</dc:date>
    <item>
      <title>variant_explode_outer stop working after the last DBX runtime patch</title>
      <link>https://community.databricks.com/t5/data-engineering/variant-explode-outer-stop-working-after-the-last-dbx-runtime/m-p/153678#M53995</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I import following JSON to delta table into VARIANT column:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{
    "data": [
        {
            "group": 1,
            "manager": "no",
            "firstname": "John",
            "lastname": "Smith",
            "active": "false",
            "team_lead": "yes"
        }
    ],
    "page": 1,
    "size": 5000,
    "timestamp": 1775455219,
    "total": 1342,
    "totalPages": 1
}&lt;/LI-CODE&gt;&lt;P&gt;Because of the size I import only data node as VARIANT&lt;/P&gt;&lt;P&gt;To do that I use following logic:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;df_transformed = spark.read.format("json")
    .option("modifiedAfter", modified_after)
    .option("singleVariantColumn", "DATA")
    .load(f"{FILE_PATH}")

df_transformed.createOrReplaceTempView("df_transformed")&lt;/LI-CODE&gt;&lt;P&gt;or streaming version:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;df_transformed = (
    spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("singleVariantColumn", "DATA")
    .load(f"{FILE_PATH}")
)

df_transformed.createOrReplaceTempView("df_transformed")&lt;/LI-CODE&gt;&lt;P&gt;and then query:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select * except(DATA) from df_transformed
,LATERAL variant_explode_outer(DATA:data) AS DATA_exploded&lt;/LI-CODE&gt;&lt;P&gt;worked as expected before 1st of April with runtime 17.3.8&lt;/P&gt;&lt;P&gt;Then DBX upgraded the&amp;nbsp;runtime to version 17.3.9 and streaming version raise following error:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;org.apache.spark.sql.AnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] Unsupported subquery expression: Correlated column reference 'df_transformed.DATA' cannot be variant type. SQLSTATE: 0A000; line 1 pos 22&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;spark.read version still works as expected&lt;/P&gt;&lt;P&gt;I tested this with the latest version of 18.0, 18.1 and get the same error&lt;/P&gt;&lt;P&gt;The only version that still works is unsupported 17.2&lt;/P&gt;&lt;P&gt;Probably because it didn't get any update&lt;/P&gt;&lt;P&gt;How to fix this problem?&lt;/P&gt;&lt;P&gt;Am I doing something wrong in this query?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Apr 2026 07:45:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/variant-explode-outer-stop-working-after-the-last-dbx-runtime/m-p/153678#M53995</guid>
      <dc:creator>norbitek</dc:creator>
      <dc:date>2026-04-08T07:45:56Z</dc:date>
    </item>
    <item>
      <title>Re: variant_explode_outer stop working after the last DBX runtime patch</title>
      <link>https://community.databricks.com/t5/data-engineering/variant-explode-outer-stop-working-after-the-last-dbx-runtime/m-p/153766#M54011</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've been testing this on a workspace at my end and see exactly the same thing. I'd first recommend raising a support ticket for this.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the meantime you can use the following workaround:&lt;/P&gt;
&lt;P&gt;I reproduced it on DBR 18.0 using readStream + cloudFiles + singleVariantColumn - the exact error you're seeing: &lt;BR /&gt;&lt;BR /&gt;[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE] &lt;BR /&gt;Correlated column reference 'DATA' cannot be variant type. &lt;BR /&gt;&lt;BR /&gt;It only affects streaming. The same query works fine in batch with spark.read. I'd recommend raising a support ticket as this could be a regression from 17.3.8. &lt;BR /&gt;&lt;BR /&gt;Workaround (tested on DBR 18.0 in streaming): &lt;BR /&gt;&lt;BR /&gt;Convert the variant array to a typed array with from_json, then use standard explode_outer: &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;This casts the variant to a string, parses it into a typed array, and uses explode_outer instead of variant_explode_outer. It avoids the correlated variant reference that &lt;BR /&gt;triggers the error.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;SELECT record.*
FROM (
SELECT explode_outer(
from_json(
DATA:data::STRING,
'array&amp;lt;struct&amp;lt;group:int, manager:string, firstname:string, lastname:string, active:string, team_lead:string&amp;gt;&amp;gt;'
)
) AS record
FROM df_transformed
) t&lt;/LI-CODE&gt;
&lt;P&gt;&lt;BR /&gt;The trade-off is you need to define the struct schema in the from_json call, which you don't with variant_explode_outer. But it works in both batch and streaming.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Apr 2026 16:06:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/variant-explode-outer-stop-working-after-the-last-dbx-runtime/m-p/153766#M54011</guid>
      <dc:creator>emma_s</dc:creator>
      <dc:date>2026-04-08T16:06:01Z</dc:date>
    </item>
  </channel>
</rss>

