How to parse VARIANT type column using Pyspark sintax?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-05-2024 06:35 PM
I trying to parse VARIANT data type column, what is the correct sintax to parse sub columns using Pyspark, is it possible?.I'd like to know how to do it this way (I know how to do it using SQL syntax).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-05-2024 11:54 PM
Hi @juanicobsider ,
I think that syntax is not fully supported yet in pyspark. As a workaround you can use expr like below:
from pyspark.sql import Row
from pyspark.sql.functions import parse_json,col, expr
json_string = '{"title":"example", "animal": "test"}'
df = spark.createDataFrame([
Row(json_col=json_string)
]
)
df = (
df.select(
parse_json(
col("json_col") ).alias("json_col")
)
)
display(df.select(expr("json_col:animal")))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-06-2024 01:41 AM
As an addition to what @szymon_dybczak already said correctly. It's actually not a workaround, it's designed and documented that way. Make sure that you understand the difference between `:`, and `.`.
Regarding PySpark, the API has other variant related functions as well, like variant_get.