cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to parse VARIANT type column using Pyspark sintax?

juanicobsider
New Contributor

I trying to parse VARIANT data type column, what is the correct sintax to parse sub columns using Pyspark, is it possible?.I'd like to know how to do it this way (I know how to do it using SQL syntax).

juanicobsider_0-1722907722976.png

juanicobsider_1-1722907840323.png

juanicobsider_2-1722907947212.png

 

 

 

2 REPLIES 2

szymon_dybczak
Contributor III

Hi @juanicobsider ,

I think that syntax is not fully supported yet in pyspark. As a workaround you can use expr like below:

 

 

from pyspark.sql import Row
from pyspark.sql.functions import parse_json,col, expr

json_string = '{"title":"example", "animal": "test"}'
df = spark.createDataFrame([
    Row(json_col=json_string)
    ]
)

df = (
    df.select(
        parse_json(
            col("json_col")  ).alias("json_col")
    )      
)

display(df.select(expr("json_col:animal")))

Witold
Contributor III

As an addition to what @szymon_dybczak already said correctly. It's actually not a workaround, it's designed and documented that way. Make sure that you understand the difference between `:`, and `.`.

Regarding PySpark, the API has other variant related functions as well, like variant_get.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group