<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to merge two data frames column-wise in Apache Spark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29716#M21427</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@bhosskie and @Govind89 &lt;/P&gt;
&lt;P&gt;Think what is asked is to merge all columns, one way could be to create monotonically_increasing_id() column, only if each of the dataframes are exactly the same number of rows, then joining on the ids. The number of columns in each dataframe can be different.&lt;/P&gt;
&lt;P&gt;from pyspark.sql.functions import monotonically_increasing_id &lt;/P&gt;
&lt;P&gt; df1 = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar", "bar","aaa","foo"), ("aaa", "bbb","ccc","ddd")], ("k", "K" ,"v" ,"V")) &lt;/P&gt;
&lt;P&gt; df2 = sqlContext.createDataFrame([("aaa", "bbb","ddd"), ("www", "eee","rrr"), ("jjj", "rrr","www")], ("m", "M" ,"n")) &lt;/P&gt;
&lt;P&gt; df1 = df1.withColumn("id", monotonically_increasing_id()) &lt;/P&gt;
&lt;P&gt; df2 = df2.withColumn("id", monotonically_increasing_id()) &lt;/P&gt;
&lt;P&gt; df1.show() &lt;/P&gt;
&lt;P&gt;df2.show() &lt;/P&gt;
&lt;P&gt; df3 = df2.join(df1, "id", "outer").drop("id") &lt;/P&gt;
&lt;P&gt; df3.show()&lt;/P&gt;
&lt;P&gt;Gives output of the columns merged, although the order is reversed in the rows displayed.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 15 Mar 2017 21:01:14 GMT</pubDate>
    <dc:creator>DarrellUlm</dc:creator>
    <dc:date>2017-03-15T21:01:14Z</dc:date>
    <item>
      <title>How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29714#M21425</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have the following two data frames which have just one column each and have exact same number of rows. How do I merge them so that I get a new data frame which has the two columns and all rows from both the data frames. For example, &lt;/P&gt;
&lt;P&gt;&lt;B&gt;df1: &lt;/B&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;+-----+
| ColA|
+-----+
|    1|
|    2|
|    3|
|    4|
+-----+
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;B&gt;df2:&lt;/B&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;+-----+
| ColB|
+-----+
|    5|
|    6|
|    7|
|    8|
+-----+
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I want the result of the merge to be &lt;/P&gt;
&lt;P&gt;&lt;B&gt;df3:&lt;/B&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;+-----+-----+
| ColA| ColB|
+-----+-----+
|    1|    5|
|    2|    6|
|    3|    7|
|    4|    8|
+-----+-----+
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I don't quite see how I can do this with the &lt;B&gt;join&lt;/B&gt; method because there is only one column and joining without any condition will create a cartesian join between the two columns. Is there a direct SPARK Data Frame API call to do this? In R Data Frames, I see that there a &lt;B&gt;merge&lt;/B&gt; function to merge two data frames. However, I don't know if it is similar to join.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Bhaskar&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 13 May 2016 20:33:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29714#M21425</guid>
      <dc:creator>bhosskie</dc:creator>
      <dc:date>2016-05-13T20:33:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29715#M21426</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi All , Even I am working on the same Problem ? Any findings ? Thanks !&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 20:11:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29715#M21426</guid>
      <dc:creator>Govind89</dc:creator>
      <dc:date>2016-10-11T20:11:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29716#M21427</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@bhosskie and @Govind89 &lt;/P&gt;
&lt;P&gt;Think what is asked is to merge all columns, one way could be to create monotonically_increasing_id() column, only if each of the dataframes are exactly the same number of rows, then joining on the ids. The number of columns in each dataframe can be different.&lt;/P&gt;
&lt;P&gt;from pyspark.sql.functions import monotonically_increasing_id &lt;/P&gt;
&lt;P&gt; df1 = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar", "bar","aaa","foo"), ("aaa", "bbb","ccc","ddd")], ("k", "K" ,"v" ,"V")) &lt;/P&gt;
&lt;P&gt; df2 = sqlContext.createDataFrame([("aaa", "bbb","ddd"), ("www", "eee","rrr"), ("jjj", "rrr","www")], ("m", "M" ,"n")) &lt;/P&gt;
&lt;P&gt; df1 = df1.withColumn("id", monotonically_increasing_id()) &lt;/P&gt;
&lt;P&gt; df2 = df2.withColumn("id", monotonically_increasing_id()) &lt;/P&gt;
&lt;P&gt; df1.show() &lt;/P&gt;
&lt;P&gt;df2.show() &lt;/P&gt;
&lt;P&gt; df3 = df2.join(df1, "id", "outer").drop("id") &lt;/P&gt;
&lt;P&gt; df3.show()&lt;/P&gt;
&lt;P&gt;Gives output of the columns merged, although the order is reversed in the rows displayed.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Mar 2017 21:01:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29716#M21427</guid>
      <dc:creator>DarrellUlm</dc:creator>
      <dc:date>2017-03-15T21:01:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29717#M21428</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Thanks! This seems to work for me; although I was nervous about using monotonically_increasing_id() because the ids could be different for corresponding rows in the two dataframes if any parts of them were generated in different partitions. But I guess if df2 is generated from df1 the chances of that happening in practice are fairly low...? Anyway, I use a "left_outer" join to make sure I keep the rows I want in the dataframe I consider most important.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Apr 2018 11:18:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29717#M21428</guid>
      <dc:creator>LUCASPARTRIDGE</dc:creator>
      <dc:date>2018-04-18T11:18:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29718#M21429</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have the same problem. Is there any solution without using join.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Nov 2018 14:27:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29718#M21429</guid>
      <dc:creator>AmarsanaaGanbol</dc:creator>
      <dc:date>2018-11-21T14:27:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29719#M21430</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;hi&lt;/P&gt;
&lt;P&gt;Thanks for the data at the trash bag cover. I did manage to discover some clean ones that could do the trick, however, they're very extensive so will discern a few manners of lowering the diameter. If I positioned elastic around them, it will compress the lure too so welcome any greater tips. The lure itself has been operating wonderfully! I'm capturing a few butterflies I have not seen in years and I scored a fairly massive moth too. I'm having two problems though: flies and hornets. I even have zippers along the side, however, the bugs generally tend to want to exit the top and not the aspect. I assume I'm going to need to add zippers to the top coming at one another from angles so that after both are unzipped, I can just fold again the cloth. Since this can take some doing, perhaps you could propose a higher way? Thanks once&lt;A target="_blank" href="https://"&gt; more! Read&lt;/A&gt; greater at&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jun 2019 16:09:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29719#M21430</guid>
      <dc:creator>Forum_Admin</dc:creator>
      <dc:date>2019-06-27T16:09:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29720#M21431</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;nice blog&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.epsonprintersupport247.com/how-to-fix-epson-printer-problems-with-wi-fi-connection/" target="test_blank"&gt;https://www.epsonprintersupport247.com/how-to-fix-epson-printer-problems-with-wi-fi-connection/&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jun 2019 16:10:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29720#M21431</guid>
      <dc:creator>Forum_Admin</dc:creator>
      <dc:date>2019-06-27T16:10:12Z</dc:date>
    </item>
    <item>
      <title>Re: How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29721#M21432</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi, if you are looking for the answer for “Why &lt;A target="_blank" href="https://"&gt;My Brother Printer is Offline&lt;/A&gt;“? There could be many reasons for receiving message Brother Printer is offline. Follow the easy steps in the article to resolve Brother Printer Offline issues.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2019 11:11:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29721#M21432</guid>
      <dc:creator>johnwilliams</dc:creator>
      <dc:date>2019-10-09T11:11:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29722#M21433</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have the same problem&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Jul 2020 08:35:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29722#M21433</guid>
      <dc:creator>astrogobind11</dc:creator>
      <dc:date>2020-07-01T08:35:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to merge two data frames column-wise in Apache Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29723#M21434</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@bhosskie&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;from pyspark.sql import SparkSession
&lt;P&gt; spark = SparkSession.builder.appName("Spark SQL basic example").enableHiveSupport().getOrCreate()&lt;/P&gt;
&lt;P&gt; sc = spark.sparkContext &lt;/P&gt;
&lt;P&gt;sqlDF1 = spark.sql("select count(*) as Total FROM user_summary") &lt;/P&gt;
&lt;P&gt;sqlDF2 = spark.sql("select count(*) as Total_New FROM user_summary") &lt;/P&gt;
&lt;P&gt; df=sqlDF1.join(sqlDF2)&lt;/P&gt;
&lt;P&gt;df.show()&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693f000007OrooAAC"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2540iC52FEACE9DCCF76C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693f000007OrooAAC" alt="0693f000007OrooAAC" /&gt;&lt;/span&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Dec 2020 17:36:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-merge-two-data-frames-column-wise-in-apache-spark/m-p/29723#M21434</guid>
      <dc:creator>AmolZinjade</dc:creator>
      <dc:date>2020-12-16T17:36:04Z</dc:date>
    </item>
  </channel>
</rss>

