<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Flatten Deep Nested Struct in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/flatten-deep-nested-struct/m-p/11424#M6403</link>
    <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a deeply nested spark dataframe struct something similar to below&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; |-- id: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |-- lower: struct (nullable = true)&lt;/P&gt;&lt;P&gt; |    |--  field_a: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |    |-- upper: struct (containsNull = true)&lt;/P&gt;&lt;P&gt; |    |    |--  field_A: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |    |    |-- num: struct (containsNull = true)&lt;/P&gt;&lt;P&gt; |    |    |    |-- field_1: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |    |    |    |-- field_2: string (nullable = true)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Im looking to flatten this such that I have a news struct like this&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;|-- id: integer (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower: struct (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower.field_a: integer (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper: struct (containsNull = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper.field_A: integer (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper.num: struct (containsNull = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper.num.field_1: integer (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper.num.field_2: string (nullable = true)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The reason for this change is so I can put this into a nice table where each column is an element in my nested struct. The column names dont matter to much to me.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I know I can use df.select('*', 'lower.*', 'lower.upper.*' , 'lower.upper.num.*') to get what I want however heres the trick....&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This Struct will change over time and I am looking for an elegant way to do flatten the struct without referencing specific columns.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any ideas? Or tips?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks &lt;/P&gt;&lt;P&gt;Aidonis&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 18 Jan 2023 08:39:30 GMT</pubDate>
    <dc:creator>Aidonis</dc:creator>
    <dc:date>2023-01-18T08:39:30Z</dc:date>
    <item>
      <title>Flatten Deep Nested Struct</title>
      <link>https://community.databricks.com/t5/data-engineering/flatten-deep-nested-struct/m-p/11424#M6403</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a deeply nested spark dataframe struct something similar to below&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; |-- id: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |-- lower: struct (nullable = true)&lt;/P&gt;&lt;P&gt; |    |--  field_a: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |    |-- upper: struct (containsNull = true)&lt;/P&gt;&lt;P&gt; |    |    |--  field_A: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |    |    |-- num: struct (containsNull = true)&lt;/P&gt;&lt;P&gt; |    |    |    |-- field_1: integer (nullable = true)&lt;/P&gt;&lt;P&gt; |    |    |    |-- field_2: string (nullable = true)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Im looking to flatten this such that I have a news struct like this&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;|-- id: integer (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower: struct (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower.field_a: integer (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper: struct (containsNull = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper.field_A: integer (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper.num: struct (containsNull = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper.num.field_1: integer (nullable = true)&lt;/P&gt;&lt;P&gt;|-- lower.upper.num.field_2: string (nullable = true)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The reason for this change is so I can put this into a nice table where each column is an element in my nested struct. The column names dont matter to much to me.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I know I can use df.select('*', 'lower.*', 'lower.upper.*' , 'lower.upper.num.*') to get what I want however heres the trick....&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This Struct will change over time and I am looking for an elegant way to do flatten the struct without referencing specific columns.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any ideas? Or tips?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks &lt;/P&gt;&lt;P&gt;Aidonis&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jan 2023 08:39:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/flatten-deep-nested-struct/m-p/11424#M6403</guid>
      <dc:creator>Aidonis</dc:creator>
      <dc:date>2023-01-18T08:39:30Z</dc:date>
    </item>
    <item>
      <title>Re: Flatten Deep Nested Struct</title>
      <link>https://community.databricks.com/t5/data-engineering/flatten-deep-nested-struct/m-p/11425#M6404</link>
      <description>&lt;P&gt;You need to use something like:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql.types import StructType
&amp;nbsp;
def flatten(schema, prefix=None):
    fields = []
    for field in schema.fields:
        name = prefix + '.' + field.name if prefix else field.name
        dtype = field.dataType
        if isinstance(dtype, StructType):
            fields += flatten(dtype, prefix=name)
        else:
            fields.append(name)
&amp;nbsp;
    return fields
&amp;nbsp;
df.select(flatten(df.schema))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jan 2023 08:52:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/flatten-deep-nested-struct/m-p/11425#M6404</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2023-01-18T08:52:16Z</dc:date>
    </item>
    <item>
      <title>Re: Flatten Deep Nested Struct</title>
      <link>https://community.databricks.com/t5/data-engineering/flatten-deep-nested-struct/m-p/11426#M6405</link>
      <description>&lt;P&gt;@Aidan Heffernan​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://medium.com/@thomaspt748/how-to-flatten-json-files-dynamically-using-apache-pyspark-c6b1b5fd4777" target="test_blank"&gt;https://medium.com/@thomaspt748/how-to-flatten-json-files-dynamically-using-apache-pyspark-c6b1b5fd4777&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Here you can find a piece of code that is flattening json based on the datatype (Array or Struct)&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jan 2023 08:59:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/flatten-deep-nested-struct/m-p/11426#M6405</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2023-01-18T08:59:18Z</dc:date>
    </item>
    <item>
      <title>Re: Flatten Deep Nested Struct</title>
      <link>https://community.databricks.com/t5/data-engineering/flatten-deep-nested-struct/m-p/66520#M33144</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24571"&gt;@Aidonis&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can try this as well:&lt;BR /&gt;&lt;A href="https://pypi.org/project/flatten-spark-dataframe/" target="_blank"&gt;flatten-spark-dataframe · PyPI&lt;/A&gt;&lt;BR /&gt;This also allows for specific level of flattening.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Apr 2024 03:21:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/flatten-deep-nested-struct/m-p/66520#M33144</guid>
      <dc:creator>Praveen-bpk21</dc:creator>
      <dc:date>2024-04-18T03:21:17Z</dc:date>
    </item>
  </channel>
</rss>

