Hi All,
I have a deeply nested spark dataframe struct something similar to below
|-- id: integer (nullable = true)
|-- lower: struct (nullable = true)
| |-- field_a: integer (nullable = true)
| |-- upper: struct (containsNull = true)
| | |-- field_A: integer (nullable = true)
| | |-- num: struct (containsNull = true)
| | | |-- field_1: integer (nullable = true)
| | | |-- field_2: string (nullable = true)
Im looking to flatten this such that I have a news struct like this
|-- id: integer (nullable = true)
|-- lower: struct (nullable = true)
|-- lower.field_a: integer (nullable = true)
|-- lower.upper: struct (containsNull = true)
|-- lower.upper.field_A: integer (nullable = true)
|-- lower.upper.num: struct (containsNull = true)
|-- lower.upper.num.field_1: integer (nullable = true)
|-- lower.upper.num.field_2: string (nullable = true)
The reason for this change is so I can put this into a nice table where each column is an element in my nested struct. The column names dont matter to much to me.
I know I can use df.select('*', 'lower.*', 'lower.upper.*' , 'lower.upper.num.*') to get what I want however heres the trick....
This Struct will change over time and I am looking for an elegant way to do flatten the struct without referencing specific columns.
Any ideas? Or tips?
Thanks
Aidonis