<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Union Multiple dataframes in loop, with different schema in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/union-multiple-dataframes-in-loop-with-different-schema/m-p/24427#M16973</link>
    <description>&lt;P&gt;@Joseph Kambourakis​&amp;nbsp; I found a way to achieve this. using the function &lt;/P&gt;&lt;P&gt;def union_all(dfs):&lt;/P&gt;&lt;P&gt;&amp;nbsp;if len(dfs) &amp;gt; 1:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;return dfs[0].unionByName(union_all(dfs[1:]), allowMissingColumns=True)&lt;/P&gt;&lt;P&gt;&amp;nbsp;else:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;return dfs[0]&lt;/P&gt;</description>
    <pubDate>Thu, 31 Mar 2022 15:19:04 GMT</pubDate>
    <dc:creator>KKo</dc:creator>
    <dc:date>2022-03-31T15:19:04Z</dc:date>
    <item>
      <title>Union Multiple dataframes in loop, with different schema</title>
      <link>https://community.databricks.com/t5/data-engineering/union-multiple-dataframes-in-loop-with-different-schema/m-p/24425#M16971</link>
      <description>&lt;P&gt;With in a loop I have few dataframes created. I can union them with out an issue if they have same schema using (&lt;B&gt;df_unioned = reduce(DataFrame.unionAll, df_list&lt;/B&gt;). Now my problem is how to union them if one of the dataframe in df_list has different number of columns? I thought,  reduce(&lt;B&gt;df_unioned=DataFrame.unionByName, df_list, allowMissingColumns=True&lt;/B&gt;) would solve the issue but it is giving me error:&lt;B&gt; reduce() takes no keyword arguments. &lt;/B&gt;Thanks in advance. Let me know if you need any details in the question.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Mar 2022 19:47:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/union-multiple-dataframes-in-loop-with-different-schema/m-p/24425#M16971</guid>
      <dc:creator>KKo</dc:creator>
      <dc:date>2022-03-28T19:47:46Z</dc:date>
    </item>
    <item>
      <title>Re: Union Multiple dataframes in loop, with different schema</title>
      <link>https://community.databricks.com/t5/data-engineering/union-multiple-dataframes-in-loop-with-different-schema/m-p/24426#M16972</link>
      <description>&lt;P&gt;Union doesn't work if they have different schemas and columns.  If you do need to union dataframes with different schemas, just add columns of nulls for anything missing to get them to the same schema.  &lt;/P&gt;</description>
      <pubDate>Tue, 29 Mar 2022 12:00:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/union-multiple-dataframes-in-loop-with-different-schema/m-p/24426#M16972</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-03-29T12:00:03Z</dc:date>
    </item>
    <item>
      <title>Re: Union Multiple dataframes in loop, with different schema</title>
      <link>https://community.databricks.com/t5/data-engineering/union-multiple-dataframes-in-loop-with-different-schema/m-p/24427#M16973</link>
      <description>&lt;P&gt;@Joseph Kambourakis​&amp;nbsp; I found a way to achieve this. using the function &lt;/P&gt;&lt;P&gt;def union_all(dfs):&lt;/P&gt;&lt;P&gt;&amp;nbsp;if len(dfs) &amp;gt; 1:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;return dfs[0].unionByName(union_all(dfs[1:]), allowMissingColumns=True)&lt;/P&gt;&lt;P&gt;&amp;nbsp;else:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;return dfs[0]&lt;/P&gt;</description>
      <pubDate>Thu, 31 Mar 2022 15:19:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/union-multiple-dataframes-in-loop-with-different-schema/m-p/24427#M16973</guid>
      <dc:creator>KKo</dc:creator>
      <dc:date>2022-03-31T15:19:04Z</dc:date>
    </item>
    <item>
      <title>Re: Union Multiple dataframes in loop, with different schema</title>
      <link>https://community.databricks.com/t5/data-engineering/union-multiple-dataframes-in-loop-with-different-schema/m-p/38230#M26584</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;I have come across same scenario, using reduce() and unionByname we can implement the solution as below:&lt;/P&gt;&lt;P&gt;val lstDF: List[Datframe] = List(df1,df2,df3,df4,df5)&lt;/P&gt;&lt;P&gt;val combinedDF = lstDF.reduce((df1, df2) =&amp;gt; df1.unionByName(df2, allowMissingColumns = true))&lt;/P&gt;&lt;P&gt;#Scala # Spark #multiple schema&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jul 2023 03:58:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/union-multiple-dataframes-in-loop-with-different-schema/m-p/38230#M26584</guid>
      <dc:creator>anoopunni</dc:creator>
      <dc:date>2023-07-24T03:58:13Z</dc:date>
    </item>
  </channel>
</rss>

