<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to parse VARIANT type column using Pyspark sintax? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-parse-variant-type-column-using-pyspark-sintax/m-p/81993#M36469</link>
    <description>&lt;P&gt;As an addition to what&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;already said correctly. It's actually not a workaround, it's designed and &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/semi-structured/variant#query-fields-in-a-variant-column" target="_self"&gt;documented&lt;/A&gt; that way. Make sure that you understand the difference between `:`, and `.`.&lt;/P&gt;&lt;P&gt;Regarding PySpark, the API has other variant related functions as well, like &lt;A href="https://spark.apache.org/docs/4.0.0-preview1/api/python/reference/pyspark.sql/api/pyspark.sql.functions.variant_get.html" target="_self"&gt;variant_get&lt;/A&gt;.&lt;/P&gt;</description>
    <pubDate>Tue, 06 Aug 2024 08:41:54 GMT</pubDate>
    <dc:creator>Witold</dc:creator>
    <dc:date>2024-08-06T08:41:54Z</dc:date>
    <item>
      <title>How to parse VARIANT type column using Pyspark sintax?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-parse-variant-type-column-using-pyspark-sintax/m-p/81955#M36457</link>
      <description>&lt;P&gt;I trying to parse VARIANT data type column, what is the correct sintax to parse sub columns using Pyspark, is it possible?.I'd like to know how to do it this way (I know how to do it using SQL syntax).&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="juanicobsider_0-1722907722976.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/10148i7D0ADCEC7BCC2507/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="juanicobsider_0-1722907722976.png" alt="juanicobsider_0-1722907722976.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="juanicobsider_1-1722907840323.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/10150i548527B2F24D6052/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="juanicobsider_1-1722907840323.png" alt="juanicobsider_1-1722907840323.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="juanicobsider_2-1722907947212.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/10151iC764BDD86608C696/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="juanicobsider_2-1722907947212.png" alt="juanicobsider_2-1722907947212.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 01:35:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-parse-variant-type-column-using-pyspark-sintax/m-p/81955#M36457</guid>
      <dc:creator>juanicobsider</dc:creator>
      <dc:date>2024-08-06T01:35:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to parse VARIANT type column using Pyspark sintax?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-parse-variant-type-column-using-pyspark-sintax/m-p/81973#M36458</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114927"&gt;@juanicobsider&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I think that syntax is not fully supported yet in pyspark. As a workaround you can use expr like below:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql import Row
from pyspark.sql.functions import parse_json,col, expr

json_string = '{"title":"example", "animal": "test"}'
df = spark.createDataFrame([
    Row(json_col=json_string)
    ]
)

df = (
    df.select(
        parse_json(
            col("json_col")  ).alias("json_col")
    )      
)

display(df.select(expr("json_col:animal")))&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 06 Aug 2024 06:54:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-parse-variant-type-column-using-pyspark-sintax/m-p/81973#M36458</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-08-06T06:54:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to parse VARIANT type column using Pyspark sintax?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-parse-variant-type-column-using-pyspark-sintax/m-p/81993#M36469</link>
      <description>&lt;P&gt;As an addition to what&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;already said correctly. It's actually not a workaround, it's designed and &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/semi-structured/variant#query-fields-in-a-variant-column" target="_self"&gt;documented&lt;/A&gt; that way. Make sure that you understand the difference between `:`, and `.`.&lt;/P&gt;&lt;P&gt;Regarding PySpark, the API has other variant related functions as well, like &lt;A href="https://spark.apache.org/docs/4.0.0-preview1/api/python/reference/pyspark.sql/api/pyspark.sql.functions.variant_get.html" target="_self"&gt;variant_get&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 08:41:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-parse-variant-type-column-using-pyspark-sintax/m-p/81993#M36469</guid>
      <dc:creator>Witold</dc:creator>
      <dc:date>2024-08-06T08:41:54Z</dc:date>
    </item>
  </channel>
</rss>

