<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic how to dynamically explode array type column in pyspark or scala in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-dynamically-explode-array-type-column-in-pyspark-or-scala/m-p/27071#M18973</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;HI,&lt;/P&gt;
&lt;P&gt;i have a parquet file with complex column types with nested structs and arrays. &lt;/P&gt;
&lt;P&gt;I am using the scrpit from below link to flatten my parquet file.&lt;/P&gt;
&lt;P&gt; &lt;A href="https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema" target="test_blank"&gt;https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I am able to flatten schema using script in STEP 1 and STEP 2 successfully. But in the above link, for STEP 3 the script uses hardcoded column names to flatten arrays. But in my case i have multiple columns of array type that need to be transformed so i cant use this method.&lt;/P&gt;
&lt;P&gt;Is there any way to dynamically transform all the array type columns without hardcoding because in future the columns may change in my case. Something like check if a column is of array type and explode it dynamically and repeat for all columns of arrays.&lt;/P&gt;
&lt;P&gt;Please advise&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 19 Aug 2020 18:31:33 GMT</pubDate>
    <dc:creator>SatheeshSathees</dc:creator>
    <dc:date>2020-08-19T18:31:33Z</dc:date>
    <item>
      <title>how to dynamically explode array type column in pyspark or scala</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-dynamically-explode-array-type-column-in-pyspark-or-scala/m-p/27071#M18973</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;HI,&lt;/P&gt;
&lt;P&gt;i have a parquet file with complex column types with nested structs and arrays. &lt;/P&gt;
&lt;P&gt;I am using the scrpit from below link to flatten my parquet file.&lt;/P&gt;
&lt;P&gt; &lt;A href="https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema" target="test_blank"&gt;https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I am able to flatten schema using script in STEP 1 and STEP 2 successfully. But in the above link, for STEP 3 the script uses hardcoded column names to flatten arrays. But in my case i have multiple columns of array type that need to be transformed so i cant use this method.&lt;/P&gt;
&lt;P&gt;Is there any way to dynamically transform all the array type columns without hardcoding because in future the columns may change in my case. Something like check if a column is of array type and explode it dynamically and repeat for all columns of arrays.&lt;/P&gt;
&lt;P&gt;Please advise&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Aug 2020 18:31:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-dynamically-explode-array-type-column-in-pyspark-or-scala/m-p/27071#M18973</guid>
      <dc:creator>SatheeshSathees</dc:creator>
      <dc:date>2020-08-19T18:31:33Z</dc:date>
    </item>
    <item>
      <title>Re: how to dynamically explode array type column in pyspark or scala</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-dynamically-explode-array-type-column-in-pyspark-or-scala/m-p/27072#M18974</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hello, Please check out the below docs and notebook which has similar examples, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema" target="_blank"&gt;https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema&lt;/A&gt;&lt;P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/transform-complex-data-types-python.html" target="_blank"&gt;https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/transform-complex-data-types-python.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Sep 2020 19:39:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-dynamically-explode-array-type-column-in-pyspark-or-scala/m-p/27072#M18974</guid>
      <dc:creator>shyam_9</dc:creator>
      <dc:date>2020-09-18T19:39:35Z</dc:date>
    </item>
  </channel>
</rss>

