- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 05:23 AM - edited 12-10-2024 05:31 AM
Edit: value of inner key "value" was an array - I have added the square brackets to the example below.
Hello all,
I'm working with Spark SQL API for querying semi-structured data in Databricks. Currently I'm having a hard time understanding how I can navigate a complex column of the type `VARIANT` to extract field values that are nested at multiple levels.
Take the below example (column name "extendedinformation") in a table (table name "orders") that have been saved as a complex type containing nested json like so:
[
{
"name": "CHANNEL",
"value": "[{\"id\":\"DUMMYID1\",\"name\":\"DUMMYCHANNEL1\",\"role\":\"DUMMYROLE1\"}]"
}
]
When querying the data using Spark SQL API on a SQL Warehouse in the Preview Channel (2024.40), I'm unable to find the best way to extract the inner field "name" ($[0].value.name).
If I use variant functions and shortcuts as so:
SELECT extendedinformation:[0].value FROM orders
i get back the contents of $[0].value, and as far as the front-end editor in Databricks shows, this column is a complex type as this (whether this is the true underlying truth is unknown to me, but this i what is shown in the online editor):
[{"id":"DUMMYID1","name":"DUMMYCHANNEL1","role":"DUMMYROLE1"}]
Note that it seems like the contents have been parsed automatically and correctly.
Now I had hoped I could do the following:
SELECT extendedinformation:[0].value.name FROM orders
or
SELECT extendedinformation:[0].value:name FROM orders
But all I get back are `NULL` values.
There might be way to get it to work wrapping `GET_JSON_OBJECT()`, but this is very finicky, has resulted in errors in terms of parsing without changing modes (`FAILFAST` -> `PERMISSIVE`) and, as far as I can understand, might very well incure a performance penalty because of conversions between complex and strings (and then parsing).
How should one approach this using complex functionality in Spark SQL in Databricks?