Databricks Community

qwerty1 · ‎04-24-2023

My data is a dump of JSON response from an API. The schema of the json is

col_name  data_type
 
data           array<struct<attributes:struct<name: String, age: Int relationships:struct<address:struct<data:arraay<struct<id: long, type: string>>>>>>>
 
included    array<struct<id: long, type: string, attributes:struct<address: string, postalCode: string, country: string>>>

As you can see the column data contains an array of person details and includes a relationship to that person's address via an id. The column included contains the the actual address.

I want to transform this data into a new table where the person data includes the address. In short I want to get rid of this `included` business. I only have SQL to go with right now because I am using this in a STREAMING LIVE TABLE query.

qwerty1 · ‎04-26-2023

I used a similar solution (exploding only one column) and it worked

View solution in original post

qwerty1 · ‎04-24-2023

@Kaniz Fatma isn't this basically doing an "explode" on "data" and "included" and then joining them? We end up doing join on the whole data set instead of within the row.